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PORPHYROMONAS GINGIVALIS ARGININE-SPECIFIC PROTEINASE 

CODING SEQUENCES 

This invention was made, at least in part, with funding from 
5 the National Institutes of Health (Grant Nos. DE 09761, HL 26148 
and HL 37 090) . Accordingly, the United States Government may 
have certain rights in this invention. 

FIELD OF THE INVENTION 

10 

The field of this invention is bacterial proteases, more 
particularly those of Porphvromonas ainaivalis . most particularly 
the arginine-specif ic protease termed Arg-gingipain herein and 
the nucleotide sequences encoding same. 

15 

BACKGROUND OF THE INVENTION 

Porphvromonas ginqivalis (formerly Bacteroides ginqivalis ) 
is an obligately anaerobic bacterium which is implicated in 

20 periodontal disease. P^ ginqivalis produces proteolytic enzymes 
in relatively large quantities; these proteinases are recognized 
as important virulence factors. A number of physiologically 
significant proteins, including collagen, fibronectin, 
immunoglobulins, complement factors C3 , C4 , C5, and B, lysozyme, 

25 iron-binding proteins, plasma proteinase inhibitors, fibrin and 
fibrinogen , and key factors of the plasma coagulation cascade 
system, are hydrolyzed by proteinases from this microorganism. ' 
Such broad proteolytic activity may play a major role in the 
evasion of host defense mechanisms and the destruction of 

30 gingival connective tissue associated with progressive 
periodontitis (Saglie et al. (1988) J. Periodontol . 59, 259-265). 
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There are conflicting data as to the number and types of 
proteinases produced by P_s_ gingivalis . In the past, proteolytic 
activities of aingivalis were classified into two groups; 

those enzymes which specifically degraded collagen and the 
general "trypsin-like" proteinases which appeared to be 
responsible for other proteolytic activity. Trypsin (and 
trypsin-like proteases) cleaves after arginine or lysine in the 
substrates (See, e.g. Lehninger A. L. (1982) , Principles of 
Biochemistry , Worth Publishing, Inc., New York). Although many 
attempts have been made to separate one of these trypsin-like 
proteinases, Chen et al . (1992) J*. Biol. Chem. 267, 18896-18901 
reported the first rigorous purification and biochemical and 
enzymological characterization for an Arginine-specif ic P^ 
aingivalis protease. 

This application reports the purification of 50 kDa and 
high molecular weight trypsin-like, thiol-activated proteinases 
of P^. aingivalis and nucleotide sequences encoding same. 

SUMMARY OF THE INVENTION 

An object of the present invention is to provide a 
nucleotide sequence encoding a low molecular weight Arg- 
gingipain, termed Arg-gingipain-1 (or gingipain-1) , herein, said 
gingipain-1 having an apparent molecular mass of 50 kDa as 
estimated by sodium dodecyl sulfate polyacrylamide gel 
electrophoresis and an apparent molecular mass of 44 kDa as 
estimated by gel filtration chromatography, said gingipain-1 
having amidolytic and proteolytic activity for cleavage after 
arginine residues and having no amidolytic and/or proteolytic 
activity for cleavage after lysine residues, wherein the 
amidolytic and/or proteolytic activity is inhibited by cysteine 
protease group-specific inhibitors including iodoacetamide, 
iodoacetic acid, N-ethylmaleimide, leupeptin, antipain, trans - 
epoxysuccinyl-L-leucylamido- (4-guanidine) butane, TLCK, TPCK, p- 
aminobenzamidine, N-chlorosuccinamide, and chelating agents 
including EDTA and EGTA, wherein the amidolytic and/ or 
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proteolytic activity of said gingipain-1 is not sensitive to 
inhibition by human cystatin C, a2-macroglobulin, al-proteinase 
inhibitor , antithrombin III, a2-antiplasmin / serine protease 
group-specific inhibitors including diisopropylf luorophosphate, 
5 phenylmethyl sulf onylf luoride and 3 , 4-diisochlorocoumarin, and 
wherein the amidolytic and/or proteolytic activities of 
gingipain-1 are stabilized by Ca 2+ and wherein the amidolytic 
and/or proteolytic activities of said gingipain-1 are stimulated 
by glycine-containing peptides and glycine analogues. In a 

10 specifically exemplified gingipain-1 protein, the protein is 
characterized by an N-terminal amino acid sequence as given in 
SEQ IDN0:1 Tyr-Thr-Pro-Val-Glu-Glu-Lys-Gln-Asn-Gly-Arg-Met-Ile- 
Val-Ile-Val-Ala-Lys-Lys-Tyr-Glu-Gly-Asp-Ile-Lys-Asp-Phe-Val-Asp- 
Trp-Lys-Asn-Gln-Arg-Gly-Leu-Thr-Lys-Xaa-Val-Lys-Xaa-Ala) and by 

15 a C-terminal amino acid sequence as given in SEQ ID NO: 6 (Glu- 
Leu-Leu-Arg) . 

A further object of this invention is a nucleotide sequence 
encoding a high molecular weight form of Arg-gingipain, termed 
20 Arg-gingipain-2 herein, which comprises a proteolytic component 
essentially as described hereinabove and at least one 
hemagglutinin component. 

As specifically exemplified, the encoded Arg-gingipain- 
25 hemagglutinin complex is transcribed as a prepolyprotein, with 
the amino acid sequence as given in SEQ ID NO: 11 from amino acid 
1-1704. The encoded mature high molecular weight Arg-gingipain 
protein has a protease component having a complete deduced amino 
acid sequence as given in SEQ ID NO: 11 from amino acid 228 
30 through amino acid 719. An alternative protease component amino 
acid sequence is given in SEQ ID NO: 4, amino acids 1-510. Arg- 
gingipain-2 further comprises at least one hemagglutinin 
component. The hemagglutinin components which are found' 
associated with the 50 kDA Arg-specif ic proteolytic component are 
35 44 kDa, 27 kDa and 17 kDa, and have amino acid sequences as given 
in SEQ ID NO: 11, from 720 to 1091, from 1092 to 1429 and from 
1430 to 1704, respectively. 
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It is an additional object of the invention to provide 
nucleic acid molecules for the recombinant production of an Arg- 
gingipain. Substantially pure recombinant Arg-gingipain-1 
protein can be prepared after expression of the nucleotide 
5 sequences encoding Arg-gingipain in a heterologous host cell 
using the methods disclosed herein. Said substantially pure Arg- 
gingipain-1 exhibits amidolytic and/or proteolytic activity with 
specificity for cleavage after arginine, but exhibits no 
amidolytic and/or proteolytic activity with specificity for 

10 cleavage after lysine residues. The purification method 
exemplified herein comprises the steps of precipitating 
extracellular protein from cell-free culture supernatant of 
Porphvromonas ginaivalis with ammonium sulfate (90% w/v 
saturation) , fractionating the precipitated proteins by gel 

15 filtration, further fractionating by anion . exchange 
chromatography those proteins in the fractions from gel 
filtration with the highest specific activity for amidolytic 
activity as measured with Benzoyl-L-arginyl-p-nitroanilide and 
collecting those proteins which were not bound to the anion 

2 0 exchange column, and fractionating those proteins by FPLC over 
a cation exchange column (MonoS HR5/5, Pharmacia, Piscataway, NJ) 
and finally separating, gingipain-1 from lysine-specif ic 
proteolytic/amidolytic protein (s) by affinity chromatography over 
L-arginyl-agarose. Preferably the Ea_ ginaivalis used is strain 

25 H66, and preferably the culture is grown to early stationary 
phase. Arg-gingipain-1 can also be purified from cells using 
appropriate modifications of the foregoing procedures (cells must 
be disrupted, e.g., by lysis in a French pressure cell). 
Preferably the gel filtration step is carried out using Sephadex 

30 G-150, the anion exchange chromatography step is carried out 
using diethylaminoethyl (DEAE) -cellulose, the FPLC step is 
carried out using Mono S, and the affinity chromatography is 
carried out using L-arginyl-Sepharose 4B. 

35 It is a further object of this invention to provide 

recombinant polynucleotides (e.g., a recombinant DNA molecule) 
comprising a nucleotide sequence encoding an Arg-gingipain 
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protein, preferably having an amino acid sequence as given in SEQ 
ID NO: 11 from amino acid 228 through amino acid 719 or having an 
amino acid sequence as given in SEQ ID NO: 4, amino acids 1 
through 510. As specifically exemplified herein, the nucleotide 
sequence encoding a mature Arg-gingipain protease is given in SEQ 
ID NO: 10, nucleotides 1630 through 3105, or SEQ ID NO: 3 from 
nucleotides 1630 through 3105. The skilled artisan will 
understand that the amino acid sequence of the exemplified 
gingipain protein can be used to identify and isolate additional, 
nonexemplif ied nucleotide sequences which will encode a 
functional protein of the same amino acid sequence as given in 
SEQ ID N0:4 from amino acid 1 through amino acid 510 or an amino 
acid sequence of greater than 90% identity and having equivalent 
biological activity. The skilled artisan understands that it may 
be desirable to express the Arg-gingipain as a secreted protein; 
if so, he knows how to modify the exemplified coding sequence for 
the "mature" gingipain-2 by adding a nucleotide sequence encoding 
a signal peptide appropriate to the host in which the sequence 
is expressed. When it is desired that the sequence encoding an 
Arg-gingipain protein be expressed, then the skilled artisan will* 
operably link transcription and translational control regulatory 
sequences to the coding sequences, with the choice of the 
regulatory sequences being determined by the host in which the 
coding sequence is to be expressed. With respect to a 
recombinant DNA molecule carrying an Arg-gingipain coding 
sequence, the skilled artisan will choose a vector (such as a 
plasmid or a viral vector) which can be introduced into and which 
can replicate in the host cell. The host cell can be a 
bacterium, preferably Escherichia coli . or a yeast or mammalian 
cell. 

Also provided is a specific exemplification of a nucleotide 
sequence encoding an Arg-gingipain, including low molecular 
weight gingipain-l protease component and the protease component 
of high molecular weight gingipain and its associated 
hemagglutinin components. These components are processed from 
a prepolyprotein. As specifically exemplified, the coding 



WO 95/07286 PCT/US94/10283 



sequence, from nucleotide 949 to nucleotide 6063 in SEQ ID NO: 10, 
including the stop codon, encodes a prepolyprotein having an 
amino acid sequence as given in SEQ ID NO: 11. The prepolyprotein 
is encoded by a nucleotide sequence as given in SEQ ID NO: 10 from 
5 nucleotide 949 to 6063. The mature protease molecule is encoded 
at nucleotides 163 0 through 3105 in SEQ ID NO: 10. The mature 
Arg-specific proteolytic component has an amino acid sequence as 
given in SEQ ID NO: 11 from 228-719, and the hemagglutin component 
has an amino acid sequence as in SEQ ID NO: 11 from 720-1091, from 
10 1092 to 1429 or from 1430 to 1704. 

In another embodiment, recombinant polynucleotides which 
encode an Arg-gingipain, including, e.g. , protein fusions or 
deletions, as well as expression systems are provided. 
15 Expression systems are defined as polynucleotides which, when 
transformed into an appropriate host cell, can express a 
proteinase. The recombinant polynucleotides possess a nucleotide 
sequence which is substantially similar to a natural Arg- 
gingipain-encoding polynucleotide or a fragment thereof. 

20 

The polynucleotides include RNA, cDNA, genomic DNA, 
synthetic forms, and mixed polymers, both sense and antisense 
strands, and may be chemically, or biochemically modified or 
contain non-natural or derivatized nucleotide bases. DNA is 

25 preferred. Recombinant polynucleotides comprising sequences 
otherwise not naturally occurring are also provided by this 
invention, as are alterations of a wild type proteinase sequence, 
including but not limited to deletion, insertion, substitution 
of one or more nucleotides or by fusion to other polynucleotide 

3 0 sequences. 

The present invention also provides for fusion polypeptides 
comprising an Arg-gingipain. Homologous polypeptides may be 
fusions between two or more proteinase sequences or between the 
3 5 sequences of a proteinase and a related protein. Likewise, 
heterologous fusions may be constructed which would exhibit a 
combination of properties or activities of the proteins from 
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which they are derived. Fusion partners include but are not 
limited to immunoglobulins, ubiquitin bacterial £-galactosidase, 
trpE, protein A, 0-lactamase, alpha amylase, alcohol 
dehydrogenase and yeast alpha mating factor, (Godowski et al. 
(1988) Science, 241 , 812-816). Fusion proteins will typically 
be made by recombinant methods but may be chemically synthesized. 

Compositions and immunogenic preparations including but not 
limited to vaccines, comprising recombinant Arg-gingipain derived 
from crinaivalis and a suitable carrier therefor are provided. 
Such vaccines are useful, for example, in immunizing an animal, 
including humans, against inflammatory response and tissue damage 
caused by crinaivalis in periodontal disease. The vaccine 

preparations comprise an immunogenic amount of a proteinase or 
an immunogenic fragment or subunit thereof. Such vaccines may 
comprise one or more Arg-gingipain proteinases, or an Arg- 
gingipain in combination with another protein or other immunogen. 
By "immunogenic amount" is meant an amount capable of eliciting 
the production of antibodies directed against one or more Arg- 
gingipains in an individual to which the vaccine has been 
administered. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 illustrates the composite physical map of an Arg- 
gingipain locus. The first codon of the mature Arg-gingipain 
proteolytic component is indicated. Only major restriction sites 
employed in cloning are indicated: B, BamUI ; P, P^tl; S, Smal; 
A, Asp 718; Pv, PvuII; H, HindllX. The four arginine cleavage 
sites (R227, R719, R1091 and R1429) are each indicated with an 
asterisk (*) . The three residues forming the active site (C412, 
H438 and N669, respectively) are also shown. 

Figure 2 is a protein matrix plot, which presents analysis 
of regions of similarity between hemagglutinin domains using 
Pustell Protein Matrix from MacVector, Release 4.0. The complete 
prepolyprotein sequence (SEQ ID NO: 11) was used as X-axis and Y- 
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axis. The perfect diagonal row is the line of identity, whereas 
structure in the pattern near that diagonal corresponds to 
internal repeats. The four different domains are represented 
(Arg-gingipain protease, 44 kDa hemagglutinin, 17 kDa 
hemagglutinin and 27 kDa hemagglutinin) . Four regions of high 
homology are identified. The main homologies between 

hemagglutinin domains is shown in detail in Table 4. 

DETAILED DESCRIPTION OF THE INVENTION 

Abbreviations used herein for amino acids are standard in 
the art: X or Xaa represents an amino acid residue that has not 
yet been identified but may be any amino acid residue including 
but not limited to phosphorylated tyrosine, threonine or serine, 
as well as cysteine or a glycosylated amino acid residue. The 
abbreviations for amino acid residues as used herein are as 
follows: A, Ala, alanine; V, Val, valine; L, Leu, leucine; I, 
lie, isoleucine; P, Pro, proline; F, Phe, phenylalanine; W, Trp, 
tryptophan; M, Met, methionine; G, Gly, glycine; S, Ser, serine; 
T, Thr, threonine; C, Cys, cysteine; Y, Tyr, tyrosine; N, Asn, 
asparagine; Q, Gin, glutamine; D, Asp, aspartic acid; E, Glu, 
glutamic acid; K, Lys, lysine; R, Arg , arginine; and-H, His, 
histidine. Other abbreviations used herein include Bz, benzoyl; 
Cbz, car boxy benzoyl; pNA, E~nitroanilide ; MeO, methoxy; Sue, 
succinyl; OR, ornithyl; Pip, pipecolyl; SDS, sodium dodecyl 
sulfate; TLCK, tosyl-L-lysine chloromethyl ketone; TPCK, tosyl-L- 
phenylalanine chloromethyl ketone; S-223 8, D-Phe-Pip-Arg-pNA, S- 
2222 , Bz-Ile-Glu- (7-OR) -Gly-pNA; S-2288 , D-Ile-Pro-Arg-pNA; S- 
2251, D-Val-Leu-Lys-pNA; Bis-Tris, 2-[bis (2-hydroxyethyl) amino] - 
2- (hydroxymethyl) -propane-1, 3-diol ; FPLC, fast protein liquid 
chromatography; HPLC, high performance liquid chromatography; 
Tricine, N- [2-hydroxy-l , l-bis (hydroxymethyl) ethyl] glycine; EGTA, 
[ethylene-bis (oxyethylene-nitrile) tetraacetic acid; EDTA, 
ethylenediamine-tetraacetic acid ; Z-L-Lys-pNa , Z-L-Lysine-p- 
Nitroanilide; HMW, high molecular weight. 
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Arg-gingipain is the term given to a L. aincrivalis enzyme 
with specificity for proteolytic and/or amidolytic activity for 
cleavage of an amide bond, in which L-arginine contributes the 



identifying characteristics of cysteine dependence, inhibition 
response as described, Ca 2 * - stabilization and glycine 
stimulation. Particular forms of Arg-gingipain are distinguished 
by their apparent molecular masses of the mature proteins (as 
measured without boiling before SDS-PAGE) . Arg-gingipains of the 
present invention have no amidolytic or proteolytic activity for 
amide bonds in which L-lysine contributes the -COOH moiety. 

Arg-gingipain-1 is the name given herein to a protein 
characterized as having a molecular mass of 50 kDa as measured 
by SDS-PAGE and 44 kDa as measured by gel filtration over 
Sephadex G-150, having amidolytic and/or proteolytic activity for 
substrates having L-Arg in the P, position, i.e. on the N- 
terminal side of the peptide bond to be hydrolyzed but having no 
activity against corresponding lysine-containing substrates being 
dependent on cysteine (or other thiol groups for full activity) ,-.* : 
having sensitivity to cysteine protease group-specific inhibitors 
including iodoacetamide, iodoacetic acid, and N-methylmaleimide, 
leupeptin , antipain , trans-epoxysuccinyl-L-leucylamido- ( 4- 
guanidino) butane, TLCK, TPCK, p-aminobenzamidine , N- 
chlorosuccinamide, and chelating agents including EDTA and EGTA, 
but being resistant to inhibition by human cystatin C, q2- 
macroglobulin, al-proteinase inhibitor, antithrombin III, a2- 
antiplasmin, serine protease group— specif ic inhibitors including 
diisopropylf luorophosphate, phenylmethyl sulf ony If luoride and 
3 , 4-diisochlorocoumarin, and wherein the amidolytic and/or 
proteolytic activities of gingipain-1 are stabilized by Ca 2+ and 
wherein the amidolytic and/or proteolytic activities of said 
gingipain-l are stimulated by glycine-containing peptides and' 
glycine analogues . 

An exemplified Arg-gingipain described and termed Arg- 
gingipain-2 herein exists in the native form in a high molecular 



car boxy 1 group. 



The Arg-gingipains described herein have 
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weight form, having an apparent molecular mass of 95 kDa as 
determined by SDS-PAGE, without boiling of samples. When boiled, 
the high molecular weight form appears to dissociate into 
components of 50 kDa # 43 kDa, 27 kDa and 17 kDa. Arg-gingipain-2 
is the name given to the 50 kDa, enzymatically active component 
of the high molecular weight complex. 

The complete amino acid sequence of an exemplified mature 
Arg-gingipain is given in SEQ ID NO: 11, from amino acid 228 
through amino acid 719. A second possible exemplary amino acid 
sequence is given in SEQ ID NO: 4, amino acids 1 through 510. In 
nature these proteins are produced by the archebacterium 
Porphyromonas ginaivalis: it can be purified from cells or from 
culture supernatant or as a recombinant expression product using 
the methods provided herein. Without wishing to be bound by any 
theory, it is proposed that these sequences correspond to Arg- 
gingipain-2. 

As used herein with respect to Arg-gingipain-1 , a 
substantially pure Arg-gingipain preparation means that there is 
only one protein band visible after silver-staining an SDS 
polyacrylamide gel run with the preparation, and the only 
amidolytic and/ or proteolytic activities are those with 
specificity for L-arginine in the P, position relative to the 
bond cleaved. A substantially pure high molecular weight Arg- 
gingipain preparation has only one band (95 kDa) on SDS-PAGE 
(sample not boiled) or four bands (50 kDa, 4 3 kDa, 2 7 kDa, 17 
kDa; sample boiled). No amidolytic or proteolytic activity for 
substrates with lysine in the P, position is evident in a 
substantially pure high molecular weight or Arg-gingipain-2 
preparation. Furthermore, a substantially pure preparation of 
Arg-gingipain has been separated from components with which it 
occurs in nature. Substantially pure Arg-gingipain is 

substantially free of naturally associated components when 
separated from the native contaminants which accompany them in 
their natural state. Thus, Arg-gingipain that is chemically 
synthesized or recombinantly synthesized in a cellular system 
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different from the cell from which it naturally originates will 
be substantially free from its naturally associated components. 
Techniques for synthesis of polypeptides are described, for 
example, in Merrifield (1963) J.Amer. Chem. Soc, 85, 2149-2156. 

5 

A chemically synthesized Arg-gingipain protein is considered 
an "isolated" polypeptide, as is an Arg-gingipain produced as an 
expression product of an isolated proteinase-encoding 
polynucleotide which is part of an expression vector (i.e., a 
10 "recombinant proteinase") , even if expressed in a homologous cell 
type. 

Recombinant Arg-gingipain-1, Arg-gingipain-2 and HMW Arg- 
gingipain can be obtained by culturing host cells transformed 
15 with the recombinant polynucleotides comprising nucleotide 
sequences encoding an Arg-gingipain as described herein under 
conditions suitable to attain expression of the proteinase- 
encoding sequence. 

20 Example 1 below and Chen et al. (1992) supra describe the 

purification of Arg-gingipain-1 and HMW Arg-gingipain from P. 
ginaivalis culture supernatant, i.e., from a natural source. 
Various methods for the isolation of an Arg-gingipain from other 
biological material, such as from nonexemplif ied strains of L. 

25 ginqivalis or from cells transformed with recombinant 
polynucleotides encoding such proteins, may be accomplished by 
methods known in the art. Various methods of protein 
purification are known in the art, . including those described, 
e.g., in Guide to Protein Purification , ed. Deutscher, Vol. 182 

30 of Methods in Enzvmology (Academic Press, Inc. : San Diego, 1990) 
and Scopes , Protein Purification: Principles and Practice 
(Springer-Verlag: New York, 1982) . 

Chromatography over Sephadex G-150 yielded four peaks with 
3 5 Bz-L-Arg-pNA-hydrolyzing activity. In each of these fractions, 
the hydrolytic activity was dependent on cysteine and enhanced 
many-fold by the addition of glycyl-glycine or glycine amide. 
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Antibody specific for Arg-gingipain-l immunoprecipitates 
proteinase from all four Sephadex G-150 peaks. Without wishing 
to be bound by any particular theory, it is postulated that the 
four-peak Bz-L-Arg-pNA-amidolytic profile is an anomaly resulting 
from the binding of gingipain-1 to membrane or nucleic acid 
fragments. Alternatively, those peaks containing higher 
molecular weight protein may contain partially processed 
gingipain-l precursors. Although the purification of gingipain-1 
as exemplified is from extracellular protein, it can also be 
purified from the bacterial cells. 

Further analysis (see Example l) of the high molecular 
weight fractions containing Arg-specif ic amidolytic and 
proteolytic activity revealed that Arg-gingipain-2 (50 kDa) 
occurred non-covalently bound to proteins of 44 kDa, 27 kDa and 
17. kDa, which have hemagglutinin activity. The empirically 
determined N-terminal amino acid sequence of the complexed 44 kDa 
protein corresponds to amino acids 720-736 of SEQ ID NO: 11. 

Arg-Gingipain-1 was further. purified from the Sephadex G-150 
Peak 4 protein mixture by further steps of anion exchange 
chromatography over DEAE-cellulose and two runs over Mono S FPLC. 
Arg-gingipain-l recovery was markedly reduced if an affinity 
chromatography step (L-Arginyl-Sepharose 4B) was used to remove 
trace amounts of a contaminating proteinase with specificity for 
cleavage after lysine residues. 

Purified Arg-gingipain-l exhibits an apparent molecular mass 
of about 50 KDa as determined by SDS-polyacrylamide gel 
electrophoresis. The size estimate obtained by gel filtration 
on Superose 12 (Pharmacia, Piscataway, NJ) is 44 kDa. Amino- 
terminal sequence analysis through 43 residues gave a' unique 
structure which showed no homology with any other proteins, based 
on a comparison in the protein NBRS data base, release 39.0. The 
sequence obtained is as follows: 

Tyr-Thr-Pro-Val-Glu-Glu-Lys-Gln-Asn-Gly-Arg-Met-Ile-Val-Ile-Val- 
Ala-Lys-Lys-Tyr-Glu-Gly-Asp-Ile-Lys-Asp-Phe-Val-Asp-Trp-Lys-Asn- 
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Gln-Arg-Gly-Leu-Thr-Lys-Xaa-Val-Lys-Xaa-Ala (SEQ ID NO:l). 
The C-terminal amino acid sequence of the gingipain-1 (major form 
recognized in zymography SDS-PAGE, 0.1% gelatin in gel), was 
found to be Glu-Leu-Leu-Arg. (SEQ ID NO: 5) . This corresponds to 
5 the amino acids 716-719 in SEQ ID NO: 4 and nucleotides 3094-3105 
in SEQ ID NO: 3. This is consistent with the model for 
autoproteolytic processing of the precursor polyprotein to 
produce the mature 50 kDa gingipain-1 protein. 

10 comparison of SEQ ID NO:l with SEQ ID NO:4 and 11 shows 

differences at amino acids 37-38 of the mature Arg-gingipain. 
Without wishing to be bound by any theory, it is proposed that 
SEQ ID NO: 3 (or SEQ ID NO: 10) comprises the coding sequence f or 
Arg-gingipain-2, the enzymatically active component of the high 

15 molecular weight form of Arg-gingipain . This is consistent with 
the observation that there are at least two genes with 
substantial nucleic acid homology to the Arg-gingipain-specif ic 
probe. 

20 The enzymatic activity of Arg-gingipain-1 is stimulated by 

glycine and glycine-containing compounds. In the absence of a 
glycine-containing compound, the enzyme has essentially the same 
amidolytic activity in the pH range 7.5-9.0. However, in the 
presence of glycyl-glycine , e.g., substantial sharpening of the 

25 pH range for activity is observed, with the optimum being between 
pH 7.4 and 8.0. Preliminary kinetic data indicate that the 
effect of glycine and glycine analogues is to raise both k cat and 
K„ equally so that the Jc^/K,,, ratio does not change. It is 
therefore likely that these compounds bind to the enzyme and/or 

30 substrate after an enzyme-substrate complex has already formed. 

The high molecular weight form is stimulated only about half as 
much by glycine compounds. 

Arg-gingipain-1 requires cysteine for full amidolytic 
35 activity , and, although it is stimulated by other thiol-containing 
compounds, the effect was less pronounced. Cysteine and 
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cysteamine are most efficient, presumably because they perform 
the dual roles of reducing agents and glycine analogues. 

The amidolytic activity of Arg-gingipain-1 is inhibited by 
a number of -SH blocking group reagents, oxidants, Ca 2+ chelating 
agents, and Zn 2+ . The effect of chelating agents EDTA and EGTA 
was reversed completely by the addition of excess Ca 2+ , whereas 
in the case of Zn 2+ , it was necessary to add o-phenanthroline 
prior to Ca 2+ . 

Typical serine proteinase group-specific inhibitors have no 
effect on enzyme activity, and it is likely that inhibition by 
both TLCK and TPCK was caused by reaction with an essential 
cysteine residue in the enzyme, a known property of chloromethyl 
ketone derivatives. Significantly, Arg-gingipain-1 was inhibited 
by such cysteine proteinase inhibitors as trans-epoxysuccinyl-L- 
leucylamido-(4-guanidino) butane, leupeptin and antipain. 
Although the reactions were not stoichiometric, the inhibition 
was concentration-dependent. However, human cy statin C, an 
inhibitor of mammalian and plant cysteine proteinases , does not 
inhibit Arg-gingipain-1, nor did any of the trypsin-specif ic 
inhibitors from human plasma > including a2-macroglobulin', al- 
proteinase inhibitor, antithrombin III, and a2-antiplasmin. 
Indeed, preliminary investigations actually suggested that the 
inhibitor in each case was being inactivated by Arg-gingipain-1. 

Calcium ion stabilizes Arg-gingipain-l without directly 
affecting activity. With Ca 2+ present the enzyme is stable in 
the pH range between 4.5 and 7.5 for several days at 4°C. 
However, below pH 4 . 0 or in the absence of Ca 2+ , enzyme activity 
is quickly lost. At 37°C Ca 2+ considerably increases stability , 
although activity is lost more rapidly than at the lower 
temperature. At -20 °C Arg-gingipain-1 is stable for several 
months. During lyophilization, however, it irreversibly loses 
more than 90% of its catalytic activity. 



WO 95/07286 



PCT/US94/10283 



15 

The amidolytic activity of the purified Arg-gingipain-1 on 
synthetic peptide substrates was limited to substrates with a P,- 
Arg residue. Even then Arg-gingipain-1 had significantly 
different turnover rates on individual substrates, being most 
5 effective against S-2238 (D-Phe-Pip-Arg-pNA) and S-2222 (Bz-Ile- 
Glu- (K-OR) -Gly Arg-pNA) . Lesser, comparable activity was 
observed using S2288 (D-Ile-Pro-Arg-pNA) and Bz-Arg-pNA. D-Val- 
Leu-Lys-pNA (S-2251) , Suc-Ala-Ala-Ala-pNA, MeO-Suc-Ala-Ala — Pro- 
Val-pNA, Suc-Ala-Ala-Pro-Phe-pNA, Gly-Pro-pNA and Cbz-Phe-Leu- 

10 Glu-pNA had essentially no substrate activity. This narrow 
specificity was confirmed by examination of the cleavage products 
after incubation with the insulin B chain or mellitin; it was 
found that cleavage occurred specifically after only Arg 
residues, but not after Lys or any other amino acids unless the 

15 last affinity chromatography step over L-Arginine-Sepharose 4B 
was omitted. 

Because progressive periodontitis is characterized by tissue 
degradation, collagen destruction and a strong inflammatory 

20 response, and because P^ ainqivalis was known to exhibit 
complement-hydrolyzing activity, purified Arg-gingipain-1 was 
tested for proteinase activity using purified human complement 
C3 and C5 as substrates (See Wingrove et al. (1992) J. Biol. 
Chem. 267 ; 18902-18907). Low molecular weight Arg-gingipain 

25 selectively cleaved the a-chain, generating what initially 
appeared to be the a '-chain of C3b. Further breakdown fragments 
of the C3 a 9 -chain were observed and a decreasing intensity of 
the a '-band suggested that degradation continued. Visual 
evidence suggested that the C3 B-chain is resistant to this 

30 proteinase. Attempts to demonstrate C3a biological activity in 
the C3 digestion mixture were unsuccessful, and the C3a-like 
fragment released from the a-chain was extensively degraded by 
Arg-gingipain-1. 

35 Human C5 was also digested by Arg-gingipain-1, with initial 

cleavage specific for the C5 a-chain, as in the case of C3 . The 
a-1 (86 kDa) and the a-2 (30 kDa) fragments were the first 
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polypeptides to be formed from cleavage of C5 by gingipain-1, and 
they equal the molecular weight of the intact a-chain, a fragment 
in the size range of C5a was observed. C5a is more resistant to 
the Arg-gingipain-1 than C3a, and functional C5a may accumulate 
without further appreciable degradation. C5a biological activity 
was detected after digestion of human C5 with Arg-gingipain-1 . 
Characteristic morphologic changes in human neutrophils, known 
as polarization, were scored by counting deformed cells relative 
to normally rounded cells. 

To test for in vivo biological activity, the purified low 
molecular weight Arg-gingipain enzyme was injected into guinea 
pig skin. It induced vascular permeability enhancement at 
concentrations greater than 10" 8 M in dose-dependent and 
proteolytic activity dependent manners. Vascular permeability 
enhancement activity was not inhibited by diphenhydramine (an 
antihistamine), and the activity was enhanced by SQ 20,881 
(angiotensin-converting enzyme inhibitor) . The vascular 

permeability enhancement by Arg-gingipain-1 was inhibited by 
soybean trypsin inhibitor (SBTI) at a concentration of 10* 5 M, a 
concentration at which. SBTI did not inhibit enzymatic activity, 
as measured with Bz -L ; -Arg— pNA and azocasein as the substrates. 

Human plasma or guinea pig plasma treated with Arg- 
gingipain-1 (10" R to 10* M) induced vascular permeability 
enhancement in the guinea pig skin assay. Vascular permeability 
enhancement by Arg-gingipain-1 treated plasma was increased by 
addition of 1 , 10-phenanthroline (kinase inhibitor, chelating 
agent for Zn ions) to a final concentration of 1 mM. Vascular 
permeability enhancement by Arg-gingipain-1 treated plasmas was 
markedly reduced when plasmas deficient in Hageman factor, 
prekallikrein or high molecular weight kininogen were used. 
These results indicate that vascular permeabilizing enhancement 
by Arg-gingipain-1 acts via activation of Hageman factor and the 
subsequent release of bradykinin from high molecular weight 
kininogen by kallikrein. 
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Intradermal injection of Arg-gingipain-1 in the guinea pig 
also resulted in neutrophil accumulation at the site of 
injection, an activity which was dependent on proteolytic 
activity. 

5 

The foregoing results demonstrate the ability of Arg- 
gingipain to elicit inflammatory responses in a guinea pig animal 
model, 

10 Recombinant Arg-gingipain is useful in methods of 

identifying agents that modulate Arg-gingipain proteinase 
activity, whether by acting on the proteinase itself or 
preventing the interaction of a proteinase with a protein in 
gingival area, such as C3 or C5. One such method comprises the 

15 steps of incubating a proteinase with a putative therapeutic, 
i.e., Arg-gingipain-inhibiting, agent; determining the activity 
of the proteinase incubated with the agent; and comparing the 
activity obtained in step with the activity of a control sample 
of proteinase that has not been incubated with the agent. 

20 

SDS-PAGE analysis (without boiling) of the purified high 
molecular weight form of Arg-gingipain revealed a single band of 
apparent molecular mass of 95 kDa. This estimate was confirmed 
by analytical chromatography over a TSK 3 000SW gel filtration 

25 column. When the enzyme preparation was boiled before SDS-PAGE, 
however, bands of apparent molecular masses of approximately 50 
kDa, 44 kDa, 27 kDa and 17 kDa were observed. These bands were 
not generated by treatments at temperatures below boiling, by 
reducing agents or detergents. It was concluded that the 9 5 kDa 

30 band was the result of strong non-covalent binding between the 
lower molecular weight proteins. 

The 50 kDa proteolytic component of the high molecular' 
weight Arg-gingipain was characterized with respect to N-terminal 
35 amino acid sequence over 22 amino acids. The sequence was 
identical to the first 22 amino acids of the 50 kDa, low 
molecular weight Arg-gingipain- 1 . Characterization of the high 
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molecular weight Arg-gingipain activity showed the same 
dependence on cysteine (or other thiols) and the same spectrum 
of response to potential inhibitors. Although the high molecular 
weight Arg-gingipain was stimulated by glycine compounds, the 
5 response was only about half that observed for the low molecular 
weight form. 

The primary structure of the NH 2 -terminus of low molecular 
weight Arg-gingipain determined by direct amino acid sequencing. 

10 (SEQ ID N0:1) was used to prepare a mixture of synthetic primer 
oligonucleotides GIN-1-32 (SEQ ID NO: 6) coding for amino acids 
2 to 8 of the mature protein and primer GIN-2-30 (SEQ ID NO:7) 
coding for amino acids 25-32 of the mature protein- These 
primers were used in PCR on P^. qinaivalis DNA. A single 105-base 

15 pair product (P105) resulted. This was cloned into pCR- 
Script™SK(-) (Stratagene) and sequenced. Sequence analysis of 
P105 generated 49 nucleotides from an Arg-gingipain coding 
sequence. On the basis of the sequence of P105, another primer 
(GIN-8S-4 8) SEQ ID NO: 8 corresponding to the coding strand of the 

20 partial Arg-gingipain gene (48-mers) was synthesized in order to 
screen the XDASH DNA library using a 32 P-labeled GIN-8S-48 probe. 
A partial sequence of the Arg-gingipain gene (nucleotides 1-3159, 
SEQ ID NO: 3) was determined by screening the P^_ qinaivalis DNA 
library using 32 P-labeled hybridization GIN-8S-48 probe (SEQ ID 

25 NO:8). From a total of 2xl0 5 independent plaques screened, seven 
positive clones were isolated and purified. After extraction and 
purification, the DNA was analyzed by restriction enzymes: One 
clone (Al) has a 3.5 kb BamHI fragment and a 3 kb PstI fragment; 
another clone (Bl) has a 9.4 kb BajnHI fragment and a 9.4 kb PstI 

30 fragment; and 5 clones have a 9.4 kb BajnHI fragment and a 10 kb 
PstI fragment. These results are similar to those obtained by 
Southern analysis of L. qinaivalis DNA and are consistent with 
the existence of at least two Arg-gingipain genes. The Al clone 
was chosen for sequencing because the expected DNA size to encode 

35 a 50-kDa protein is approximately 1.35 kb. The 3.159 kb 
Pstl/BamHl fragment from clone Al was subsequently subcloned into 
pBluescript SK(-) as a PstI fragment and a Smal/BamHl fragment 
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and into M13mpl8 and 19 as a Pstl fragment and a Pstl/BamHl 
fragment and sequenced. In order to clone the stop codon of 
gingipain-1, which was missing in the Pstl/BamHl fragment, 
Pstl/Hindlll double digested P^. ainaivalis DNA ' clones were 
hybridized with 32 P-labeled GIN-14-20 (SEQ ID NO: 9) (nucleotides 
2911-2930 of SEQ ID NO: 3) localized at the 3 ' end of this clone. 
A Pstl/Hindlll fragment of approximately 4.3 kb was identified 
and cloned into pbluescript SK(-) . Smaller fragment (Pstl/Asp713 
and BamHl /Hindi II) was also subcloned into M13mpl8 and 19. 

SEQ ID NO: 3 is the DNA sequence of the 3159 bp Pstl/BamHl 
fragment (see Table 1) . 
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TABLE 1 

Nucleotide sequence and deduced 
amino acid sequence of an Arg-gingipain 



10 20 30 £0 

» 9 * <t 

CTG CAG AGG CCT GGT AAA GAC CGC CTC CGG ATC GAG GCC TTT GAG ACG 
GAC GTC TCC CCA CCA TTT CTG GC3 GAG CCC TAG CTC CGG AAA CTC TCC 

50 SO 70 80 SO 

» ■ ■» * »■ 

GGC ACA AGO CGC CGC AGC CTC CTC TTC GAA GGT GTC TCG AAC CTC CAC 
CCG TGT TCG GCG GCG TCG GAC GAG AAG CTT CCA CAG AGC TTG CAG GTG 

100 HO 220 13 0 140 

» » » » V 

ATC GGT GAA TCC CTA GCA GTG CTC ATT CCC ATT GAG CAG CAC CGA GGT 
TAG CCA CTT AGG CAT CGT CAC GAG TAA CGG TAA CTC GTC GTG GCT CCA 

150 160 170 1E0 190 

* • # V * 

GTG GCG CAT CAG ATA TAT TTT CAT CAG TGG ATT ATT ACG CTA TCG GTC 
CAC CGC GTA GTC TAT ATA AAA GTA GTC ACC TAA TAA TCC CAT AGC CAG 

200 210 220 230 240 

* * • * ■ 

ACA AAA AGC CTT CCG AAT CCG ACA AAG ATA GTA GAA AGA GAG TOC ATC 
TCT TTT TCG GAA GGC TTA GGC TGT TTC TAT CAT CTT TCT CTC ACG TAG 

2S0 260 270 280 

» ▼ * » 

TGA AAA CAG ATC ATT CGA CGA TTA TCG ATC AAC TGA AAA CGC AGG ACT 
ACT TTT GTC TAG TAA GCT CCT AAT AGC TAG TTG ACT TTT CCG TCC TCA 

290 300 310 320 330 

» « » « x 

TGT TTT GCG TTT TGG TTC GGA AAA TTA CCT GAT CAG CAT TCG TAA AAA 
ACA AAA CGC AAA ACC AAG CCT TTT AAT GGA CTA GTC CTA AGC ATT TTT 

240 3S0 360 270 3S0 

» * * WW 

CCT GGC GCG AGA ATT TTT TCG TTT TGG CGC CAG AAT TAA AAA TTT TTG 
GCA CCG CGC TCT TAA AAA AGC AAA ACC GCG CTC TTA ATT TTT AAA AAC 

390 400 410 420 430 

» * • » m * 

GAA CCA CAG CGA AAA AAA TCT CGC GCC GTT TTC TCA GGA TTT ACA GAC 
CTT GGT GTC GCT TTT TTT AGA GCG CGG CAA AAG AGT CCT AAA TGT CTG 

440 £50 460 470 480 

* » » * * 

CAC AAT CCG AGC ATT TTC GGT TCG TAA TTC ATC GAA GAG ACA GGT TTT 
GTG TTA GGC TCG TAA AAG CCA AGC ATT AAG TAG CTT CTC TGT CCA AAA 

490 500 S10 520 

ACC GCA TTG AAA TCA GAG AGA GAA TAT CCG TAG TCC AAC GGT TCA TCC 
TGG CGT AAC TTT AGT CTC TCT CTT ATA GGC ATC AGG TTG CCA AGT AGG 

53 0 54 0 550 560 570 

* ♦ * • • 

TTA TAT CAG AGG TTA AAA GAT ATG GTA CGC TCA TCG AGG AGC TCA TTG 
AAT ATA GTC TCC AAT TTT CTA TAC CAT CCG AGT AGC TCC TCG ACT AAC 

530 590 600 610 620 

» * » « w 

GCT TAG TAG CTG AGA CTT TCT TAA GAG ACT ATC CGC ACC TAC AGG AAG 
CGA ATC ATC CAC TCT GAA AGA ATT CTC TGA TAG CCG TGG ATG TCC TTC 



S30 640 650 660 67C 

* • m w m 

TTC ATG CCA CAC AAG CCA AAC GAG GCA ATC TTC GCA GAC CGG ACT CAT 
AAG TAC CGT GTG TTC CGT TTC CTC CGT TAG AAG CGT CTG GCC TGA GTA 
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620 690 700 710 720 

T * * *• » 

ATC AAA AGO ATG AAA CCA CTT TTC CAT ACC AC A ACC AAA TAG CCS TCT 
TAG TTT TCC TAC TTT GCT GAA AAG GTA TGC TCT TGG TTT ATC GGC AC A 

720 740 750 760 

V -w m ■» 

ACG GTA GAC GAA TGC AAA CCC AAT ATG AGG CCA TCA ATC AAT CCG AAT 
TCC CAT CTG CTT ACG TTT CGG TTA TAC TCC ■ GCT ACT TAG TTA GGC TTA 

770 7S0 790 300 S10 

V V * • • 

GAC AGC TTT TGG GCA ATA TAT TAT GCA TAT TTT GAT TCG CGT TTA AAG 
CTG TCG AAA ACC CGT TAT ATA ATA CGT ATA AAA CTA AGC GCA AAT TTC 

320 820 340 SSO 360 

GAA AAG TCC ATA TAT TTC CGA TIC TGG TAT TTC TTT CGG TTT CTA TGT 
CTT TTC ACG TAT ATA AAC GCT AAC ACC ATA AAG AAA GCC AAA GAT ACA 

S70 SS0 220 S00 510 

W -9 W f » 

GAA TTT TGT CTC CCA AGA AGA CTT TAT AAT GCA TAA -ATA CAG AAG GGG 
CTT AAA ACA GAG GGT TCT TCT GAA ATA TTA CGT ATT TAT GTC TTC CCC 

920 930 940 9S0 360 

■W -V « V T 

TAC TAC ACA GTA AAA TCA TAT TCT AAT TTC ATC AAA ATO AAA AAC TTG 
ATG ATG TGT CAT TTT AGT ATA AGA TTA AAG TAG TTT TAC TTT TTG AAC 

H K K L 

970 9S0 990 1000 

■ *■ * * ■ 

AAC AAC TTT GTT TCG ATT GCT CTT TCC TCT TCC TTA TTA GGA GGA ATG 
TTG TTC AAA CAA AGC TAA CGA GAA ACG AGA AGG AAT AAT CCT CCT TAC 
NXFVSIALCS SLLGGM 

1010 1020 1030 1040 1050 

<■ * v ■» 

GCA TTT GCG CAG CAG ACA GAG TTG GGA CGC AAT CCG AAT GTC AGA TTG 
CGT AAA CGC GTC GTC TGT CTC AAC CCT GCG TTA GGC TTA CAG TCT AAC 
A ? A Q QTZLG R N ? N V L 

1060 1070 1080 1090 1100 

* ■» ■ « V 

CTC GAA TCC ACT CAG CAA. TCG GTG ACA AAG GTT CAG TTC CGT ATG GAC 
GAG CTT AGG TCA GTC CTT ACC CAC TGT TTC CAA GTC AAG GCA TSLC CTG 
u I S TfifiSV.TKVQF R 21 D- 

1110 1120 12.30 1140 1150 

• » 9 * * 

AAC CTC AAG TTC ACC GAA GTT CAA ACC CCT AAG CGA . ATC GGA CAA GTG 
TTG GAG TTC AAG TGG CTT CAA GTT TGG GGA TTC* CCT "TAG CCT GTT CAC 
NLXFTSVQT? 5CGIGQV 

1160 1170 1130 1190 1200 

• w * • » 

CCG ACC TAT ACA GAA GGG GTT AAT CTT TCC GAA AAA GGG ATO CCT ACG 
CGC TGG ATA TGT CTT CCC CAA TTA GAA AGG CTT TTT CCC TAC GGA TCC 
? TYT2GV2JLS EKGM ?T 

1210 1220 1220 1240 

» • V * 

CTT CCC ATT CTA TCA CGC TCT TTG GCG GTT TCX CAC ACT CCT GAG ATC 
CAA CGG TAA GAT AGT GCG ACA AAC CGC CAA AGT CTG TCA GCA CTC »C 
I» 7 TLSRS1AVS DTR£K 

1230 1260 1270 12S0 1290 

* » «r * » 

AAC CTA CAG GTT GTT TCC TCA AAG TTC ATC CAA AAG AAA AAT GTC CTG 
TTC CAT CTC CAA CAA AGG AGT TTC AAG TAG CTT TTC TTT TTA CAC CAC 
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12C0 1310 1320 1320 1340 

♦ m - » • » 

ATT GCA CCC TCC AAG GGC ATC ATT AT3 CCT AAC GAA GAT CCS AAA AAC 
TAA CCT CGC ACG TTC CCG TAC TAA TAC GCA TTG CTT CTA GGC TTT TTC 

z a p s x g h I a a m r d ? x 5 

135C 1360 1370 1380 139C 

« « * * * •» 

ATC CCT TAC CTT TAT GGA AAG AGC TAC TCG CVA AAC AAA TTC TTC CCG 
TAG GGA ATG CAA ATA CCT TTC TCG ATG AGC GTT TTG TTT AAG AAG GGC 
Z ? Y V v g X S * S Q N X T 7 ? 

1400 1410 1420 1430 1440 

w * » » » 

GGA GAG ATC GCC ACG CTT GAT GAT CCT TTT ATC CTT CGT GAT GTG CGT 
CCT CTC TAG CGG TGC GAA CTA CTA GGA AAA TAG GAA GCA CTA CAC CCA 
GilATLDDPrlLRDVK 

1450 1460 1470 1480 

* » » V 

GGA CAC CTT CTA AAC TTT GCC CCT TTG CAC TAT AAC CCT GTG ACA AAG 
CCT GTC CAA CAT TTG AAA CGC GGA AAC GTC ATA TTG GGA CAC TGT TTC 
G QVVNrAPLQ X N ? V T 2c 

1490 150C 1510 1520 1530 

* * *> v » 

ACG TIG CGC ATC TAT ACG GAA ATC ACT GTG GCA GTG AGC GAA ACT TCG 
TGC AAC GCG TAG ATA TCC CTT TAG TCA CAC CGT CAC TCG CTT TGA AGC 
TLRIYTZITVAVSETS 

1540 1550 1550 1570 15S0 

• ■ » « » 

GAA CAA GGC AAA AAT ATT CTG AAC AAG AAA CCT ACA TTT GCC GGC TTT 

CTT GTT CCG TTT TTA X«A GAC TTC TTC TTT CCA TGT AAA CGG CCG AAA 

SQCSNZLKKSGTr&G? 

1550 1600 1510 1620 1630 

. * ....... T * 

GAA GAC ACA TAC AAG CGC ATG CTC ATG AAC TAC GAG CCG CGG CGT TAC 
CTT CTG TGT ATG CTC GCG TAC AAG TAC TTG ATG CTC GGC CCC CCA ATG 
£DTYK=l>J FMN2fS?G-l2_ 

1640 1650 1650 1670 16S0 

» j ■» » » 

ACA CCG GTA GAG GAA AAA CAA AAT GGT CGT ATG ATC GTC ATC GTA GCC 
TGT CGC CAT CTC CTT TTT CTT TTA CCA GCA TAC TAG CAG TAG C\T CGG 

t ? v g s o m s a m t v ? v a - 

16?0 1700 1710 1720 

• » w • 

AAA AAG TAT GAC GGA GAT ATT AAA GAT TTC GTT GAT TGG AAA AAC CAA 
TTT TTC ATA CTC CCT CTA TAA TTT CTA AAG CAA CTA ACC TTT TTG CTT 
5 Z y 7 £ DZ3CD?VDWKNQ 

1730 1740 1750 1760 177C 

• * » V T 

CCC GGT CTC CCT ACC GAG GTG AAA CTG GCA GAA GAT ATT GCT TCT CCC 
GCG CCA GAG GCA TGG CTC CAC TTT CAC CCT CTT CTA TAA CGA AGA CGG 
R C 2. ?. T2VKVAZDIAS? 

1780 1790 1300 1310 1S20 

W • "T « . . * 

GTT ACA CCT AAT CCT ATT CAG CXG TTC GTT AAG CAA GAA TAC GAG AAA 
CAA TGT CGA TTA CGA TAA GTC CTC AAG CAA TTC CTT CTT ATG CTC TTT 
VT ANA2QQ7VJCQ ZYSK 

183C 1340 1SSC 1S60 1870 

• • • m * 

CAA CGT AAT GAT TTG ACC TAT GTT CTT TTG GTT GGC GAT CAC AAA GAT 
CTT CCA TTA CTA AAC TGG ATA CA\ GAA AAC CAA CCG CTA GTG TTT CTA 
2 GNOLTrVwLVCDHXD 
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1880 1390 1900 191C 1S20 

* » » • * 

a-tt cct gcc aaa att act ccg ccg Arc aaa tcc cac c^g gta tat gga 

TAA GGA CGG TCT TAA TCA GGC CCC TAG TTT AGG CTG CTC CAT ATA CCT 
I ? A X I T ? C I K S D Q V Y c 

1930 1940 1950 19S0 

» • r * 

CAA ATA GTA GCT AAT GAC! CAC TAC AAC CAA CTC TTC ATC GGT CGT TTC . 
GTT TAT CAT CCA TTA CTC CTC ATC TTC CTT CAG AAG TAG CCA GCA AAG 
QlVCNDHrNSVriGR? 

1370 1S80 1990 2000 2010 

T ft • V W 

TCA TGT GAC AGC AAA GAG G3>T CTG AAG ACA CAA ATC GAT CGG ACT ATT 
AGT ACA CTC TCG TTT CTC CTA GAC TTC TGT GTT TAG CTA GCC TCA TAA 
SCSSX EOLKTQIDRTT 

2020 2030 1040 2050 206G 

9 9 * T 

CAC TAT GAG CGC AAT ATA ACC ACG GAA GAC AAA TCG CTC GGT CAC GCT 
CTG ATA CTC GCS TTA TAT TCG TGC CTT CTG TTT ACC GAG CCA GTC CGA 
HYSRNITTSDXWL GQA 

2070 2080 2090 2100 2110 

* * w * » 

CTT* TGT ATT GCT TCG GCT GAA GGA GGC CCA TCC GCA GAC AAT GGT GAA 
GAA ACA TAA CGA AGC CGA CTT CCT CCG GGT AGG CGT CTG TTA CCA CTT 
It C IASASGG? S A 3 N G £. 

212C 2130 2140 2150 2150 

* » • » » 

AGT GAT ATC CAG CAT GAG AAT GTA ATC GCC AAT CTG CTT ACC CAC TAT 
TCA CTA TAG GTC GTA CTC TTA CAT TAG CGG TTA GAC CAA TGG GTC ATA 
S DIQSENVIAWILTQ? 

2170 2130 2190 2200 

» » » a 

GGC TAT ACC AAG ATT ATC AAA TGT TAT GAT CCG GGA GTA ACT CCT AAA 
CCG ATA TGG TTC TAA TAG TTT ACA ATA CTA GGC CCT CAT TGA CGA TTT 
G Y T X I I X C Y D P GVT ? X 

2210 2220 2230 22d0 2250 

• * » 9 9 

AAC ATT ATT GAT GCT TTC AAC GGA CGA ATC TCG TTC GTC AAC TAT ACG 
TTG TAA TAA CTA CGA AAG TTG CCT CCT TAG AGC AAC CAG TTG ATA TGC 
N I I D A 7 N G G I S L V N Y T- 

2260 2270 2220 22S0 2200 

• » w m ♦ 

GGC CAC GGT AGC GAA ACA GCT TCG GGT ACG TCT CAC TTC GGC ACC ACT 
CCG GTG CCA TCG CTT TGT CGA ACC CCA TGC AGA CTG AAG CCG TCG TGA 
GECSSTAWGZSSFGT? 

2310 2320 2330 2240 23S0 



CAT GTG AAC CAG CTT ACC AAC AGC AAC CAG CTA CCG TTT ATT TTC GAC 
GTA CAC TTC GTC GAA TCG TTG TCG TTG GTC CAT GGC AAA TAA AAG CTG 
KVKCLTKSNQLP?I?0 

2350 2370 2280 2390 2400 

GTA CCT TGT GTG AAT GGC CAT TTC CTA TTC AGC* ATS CCT TGC TTC GCA 
CAT CGA ACA CAC TTA CCG CTA AAG GAT AAG TCG TAC CGA ACG AAG CGT 
VACVNGDFLFSMPCrA 

2410 2420 2430 2440 

» w « » 

CAA CCC CTG ATC CCT CCA CAA AAA CAT CCT AAG CCG ACA GGT ACT GTT 
CTT CCG CAC TAC GCA CGT GTT TTT CTA CCA TTC GGC TCT CCA TCA CAA 
SALMRAQKDGXrTGTV 
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2450 2460 2470 2*180 2450 

▼ « 9 » 

CCT ATC AT.-. CCG TCT ACS ATC AAC CAG TCT TGG GCT TCT CCT ATG CCC 
C3A TAG TAT CGC AGA TGC TAG TTG GTC AGA ACC CGA AGA GGA TAG GCG 
A I IA5T2MQSWAS?XR 

2SO0 2S10 2520 2530 2540 

9 ■ • » » 

GCG CAG GAT GAG ATG AAC GAA ATT CTG TGC GAA AAA CAC CCG AAC AAC 
CCC GTC CTA CTC TAC TTG CTT TAA GAC ACG CTT TTT GTG GGC TTG TTG 
G Q DZXKZZLCZXHPNtf 

2530 2560 2570 2S50 2590 

* V 1 * • 

ATC AAC CGT ACT TTC GGT GGT GTC ACC ATG AAC GGT ATG TTT CCT ATG 
TAG TTC GCA TCA AAG CCA CCA CAG TGC TAC TTG CCA TAC AAA CGA TAC 
Z K a7?GGVTMNG2£?AM 

2600 2510 2620 2530 2640 

CTG GAA AAG TAT AAA *AG GAT GGT GAG AAG ATG CTC GAC ACA TGG ACT 
CAC CTT TTC ATA TTT TTC CTA CCA CTC TTC TAC GAG CTG TGT ACC TCA 
V SSrXSDGSKMLDTWT 



265C 
* 

GTT TTC GGC GAC CCC 
CAA AAG CCG CTG GGG 
V ? G D P 



2660 2570 

TCG CTG CTC . GTT CGT 
ACC GAC GAG CAA GCA 
S L L V R 



2650 

w 

CTT GTC CCG ACC AAA 
GAA CAG GGC TGG TTT 
L V ? T K 



ACA 
TGT 
T 



2690 27C0 2710 2720 2770 

» ■» www 

ATG CAG GTT ACG GCT CCG GCT CAG ATT AAT TTG ACG GAT GCT TCA GTC 
TAC GTC CAA TGC CGA GGC CGA GTC TAA CTA AAC TGC CTA CGA ACT CAG 
i£ Q V T A P A Q I N L TDASV 

2740 2750 2760 2770 27S0 

-am* ▼ * 

AAC GTA TCT TGC GAT TAT AAT GGT GCT ATT GCT ACC ATT TCA GCC AAT 
TTC CAT AGA ACG CTA ATA TTA CCA CGA TAA CGA TGG TAA ACT CGG TTA 
* V S C 5 V * Q i I ^ 2 2 S ~ * 

2790 2SO0 2210 2S20 2330 

GOA AAG ATG TTC GGT TCT GCA GTT GTC GAA AAT GCA ACA GCT ACA ATC 
CCT TTC TAC AAG CCA AGA CGT CAA CAG CTT TTA CCT TGT CCA TGT TAG 
G X MrG S AV.VZN GTATI 

2840 2850 2860 2670 2350 

* * * » • 

AAT CTG ACA GGT CTG ACA AAT GAA ACC ACG CTT ACC CTT ACA GTA GTT 
TTA GAC TGT CCA GAC TGT TTA CTT TCG TGC CAA TGG GAA TGT CAT CAA 
N LTGLTNSS TLTLTVV 



2390 2900 2910 2520 

GGT TAC AAC AAA GAG ACG GTT ATT AAG ACC ATC AAC ACT AAT GGT GAG 
CCA ATG TTG TTC CTC TGC CAA TAA TTC TGG TAG TTG TCA TTA CCA CTC 
G YNXZTV2KTI NTNGS 



2930 2940 2950 

CCT AAC CCC TAC CAG CCC GTT TCC 
GGA TTG GGG ATG GTC CGG CAA ACG 
? NPYQPVS 

2930 2990 2000 

• » » 

CAG AAA CTA ACG CTC AAG TGG CAT 
GTC TTT CAT TGC GAG TTC ACC CTA 
Q K V 7 i K W D 



2960 2570 

AAC TTG ACA GCT ACA ACG CAG GGT 
TTG AAC TGT CGA TGT TCC GTC CCA 
NLTATTQG 

3010 3020 

■w w 

CCA CCG AGC ACG AAA ACC AAT GCA 
CGT GGC TCG TGC TTT TGG TTA CGT 
A ? S T S T N A> 
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3030 3G*C 20S0 3CS0 2Q7C 

▼ * » • a 

ACC ACT AAC ACC GCT CSC ACC GTC CAT GCC ATA CCA GAA TTC C~ CTT 
TCC — A TZA TOG CCA GCC TCC CAC CTA CCG TAT GCT CTT AAC CAA GAA 
T T T A ?. S V D C I ?. Z I: v i 

3CSC 3C90 3100 3110 I1CC 

•« » » » 

CTC TCA GTC ACC GAT GCC CCC CAA CTT CTT CGC AGC GCT CAC GCC CAC 
CAC ACT CAC TCC CTA CCC GCC CTT CAA GAA GCC TCC CCA GTC CGC CTC 
LSVSDA?£l.i?. g 5 S A ~ 

3130 3140 31=0 

ATT CTT CTT GAA GCT CAC gat GTT tcc aat gat gca tcc 
TAA CAA GAA CTT CCA GTG CTA C?-A ACC TTA CTA CCT ACC 
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Exemplified nucleotide sequences encoding a mature Arg- 
gingipain, termed an Arg-gingipain-2 herein, extends from 1630- 
3105 in SEQ ID NO: 3 and in SEQ ID NO: 10. The first ATG appears 
at nucleotide 949 and is followed by a long open reading frame 
5 (ORF) , of 5111 bp in Table 2 (SEQ ID NO: 10). This ORF was the 

largest one observed. However, the first ATG is following by 8 
others in frame (at nucleotides 1006, 1099, 1192, 1246, 1315, 
1321, 1603, and 1609). The most likely candidate to initiate 
translation is currently unknown. Which of these initiation 

10 codons are used in translation of the Arg-gingipain-2 precursor 
can be determined by expression of the polyprotein in bacteria 
and subsequent amino-terminal sequence analysis of proprotein 
intermediates. The sequence derived from 5' noncoding sequences 
is composed of 948 bp. The primary structure of the mature Arg- 

15 gingipain molecule can be inferred from the empirical amino- 
terminal and carboxy- terminal sequences and molecular mass. 
Thus, mature Arg-gingipain-2 has an amino terminus starting at 
nucleotide residue 1630 in SEQ ID NO: 3 and at amino acid 1 in SEQ 
ID NO: 4. As expected for an arginine-specif ic protease, the 

20 mature protein is cleaved after an arginine residue. The 50 kDa 
and the 4 4 kDa bands from Bz-L-Arg-pNa activity peak's have an 
identical sequence to that deduced amino acid sequence of 
gingipain, encoded respectively at nucleotides 1630-1695 and at 
nucleotides 3106-3156. From these data, the carboxyl terminus 

25 is most likely derived from autoproteolytic processing after the 
arginine residue encoded at 3103-3105 where the amino terminus 
encoding sequence of a hemagglutinin component starts (nucleotide 
3106). The deduced 492 amino acids of gingipain-2 give rise to 
a protease molecule with a calculated molecular weight of 54 kDa 

30 which correlates well with the molecular mass of 50 kDa 
determined by SDS-PAGE analysis. Tables 1 and 2 (see also SEQ 
ID NO: 10 and 11) presents the coding sequence and deduced amino 
acid sequence of gingipain-2. The first nucleotide presented in 
the sequence belongs to the PstI cloning site and is referred as 

35 nucleotide 1. Bold face letters indicate the potential sites of 
initiation ATG and the first codon of the mature gingipain-2. 
The amino terminal sequence of gingipain-2 and the amino terminal 



>JSDOClD: <WO 9507286A1J_> 
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sequence of 44 kDa bands from Bz-L-Arg-pNa activity peaks are 
underlined. 

Table 2 (corresponding to SEQ ID NOS: 10-11) presents the 
nucleotide sequence encoding the complete prepolyprotein 
sequence, including both the protease component and the 
hemagglutin component (s) of HMW Arg-gingipain. The coding 
sequence extends from an ATG at nucleotide 94 9 through a TAG stop 
codon at nucleotide 6063 in SEQ ID NO: 10. The deduced amino acid 
sequence is given in SEQ ID NO: 11. 



WO 95/07286 PCT7US94/10283 

TABLE 2 28 

Sequence Range: 1 to 7266 

>Pstl >Stul 
I I 

| * • • j * • * « 

CTGCAGAGGG CTGGTAAAGA CCGCCTCGGG ATCGAGGCCT TTGAGACGGG CACAAGCCGC CGCAGCCTCC 

100 

• ••••»• 

TCTTCGAAGG TGTCTCGAAC GTCCACATCG GTGAATCCGT AGCAGTGCTC ATTGCCATTG AGCAGCACCG 

200 

• ••WW • • 

AGGTGTGGCG CATCAGATAT ATTTTCATCA GTGGA7TATT AGGGTATCGG TCAGAAAAAG CCTTCCGAAT 

>Clal 
I 

« • • • • | • « 

CCGACAAAGA TAGTAGAAAG AGAGTGCATC TGAAAACAGA TCATTCGAGG ATTATCGATC AACTGAAAAG 

300 

• • * • • ♦ « 

GCAGGAGTTG TTTTGCGTTT TGGTTCGGAA AATTACCTGA TCAGCATTCG TAAAAACGTG GCGCGAGAAT 

400 

• w • * * • • 

TTTTT CGTTT TGGCGCGAGA ATTAAAAATT TTTGGAACCA CAGCGAAAAA AATCTCGCGC CGTTTTCTCA 

• •«•••• 

GGATTTACAG ACCACAATCC GAGCATTTTC GGTTCGTAAT TCATCGAAGA GACAG GTT T T ACCGCATTGA 

500 

• ♦ * • * • • 

AATCAGAGAG AGAATATCCG TAGTCCAACG GTTCATCCTT ATATCAGAGG TTAAAAGATA TGGTACGCTC 

600 

• • • . • • • • 

ATCGAGGAGC TGATTGGCTT AGTAG3TGAG ACTTTCTTAA GAGACTATCG GCACCTACAG GAAG7TCATG 

700 

• ••**•» 

GCACACAAGG CAAAGGAGGC AATCTTCGCA GACCGGACTC ATATCAAAAG GATGAAACGA CTTTT CCATA 

• * • • • • * 

CGACAACCAA ATAGCCGTCT ACGGTAGACG AATGCAAACC CAATATGAGG CCATCAATCA ATCCGAATGA 

800 

CAGCTTTTGG GCAATATATT ATGCATATTT TGATTCGCGT TTAAAGGAAA AGTGCATATA TTTGCGATTG 

900 

• • • • • • * 

TGGTATTTCT TTCGGTTTCT ATGTGAATTT TGTCTCCCAA GAAGACTTTA TAATGCATAA ATACAGAAGG 

• • • * » • 

GGTACTACAC AGTAAAATCA TATTCTAATT TCATCAAA ATG AAA AAC TTG AAC AAG TTT GTT TCG 

MKNLNKFV S> 

1000 

• ♦ • • • * 

ATT GCT CTT TGC TCT TCC TTA TTA GGA GGA ATG GCA TTT GCG CAG CAG ACA GAG TTG 
IALCSSLLGGMAFAQQTEL> 

GGA CGC AAT CCG AAT GTC AGA TTG CTC GAA TCC ACT CAG CAA TCG GTG ACA AAG GTT 
GRN?'NVRLLESTQQSVTKV> 

1100 

• • « • • • 

CAG TTC CGT ATG GAC AAC CTC AAG TTC ACC GAA GTT CAA ACC CCT AAG GGA ATC GGA 
QFRMDNLKFTSVQTPKG I G> 

1200 
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Table 2 (contd.) 29 

• * * * 

CAA GTG CCG ACC TAT ACA GAA GGG GTT AAT CTT TCC GAA AAA GGG ATG CCT ACG CTT 
QVPTYTSGVNLS 2 K G M P T L> 

• « * • * 

CCC ATT CTA TCA CGC TCT TTG GCG GTT TCA GAG ACT CGT GAG ATG AAG GTA GAG GTT 
FILSRSLAVSDTREMKVE V> 

1300 

• • * * 

GTT TCC TCA AAG TTC ATC GAA AAG AAA AAT GTC CTG ATT GCA CCC TCC AAG GGC ATG 
VSSKFI EKKNVL IAPSKGM> 

• * * » * * 

ATT ATG CGT AAC GAA GAT CCG AAA AAG ATC CCT TAG GTT TAT GGA AAG AGC TAC TCG 
IMRNEDPKKI PYVYGKSYS> 

X400 

• » • • * * 

CAA AAC AAA TTC TTC CCG GGA GAG ATC GCC ACG CTT GAT GAT CCT TTT ATC CTT CGT 
QNKFFPGEIATLDDPFILR> 

GAT GTG CGT GGA CAG GTT GTA AAC TTT GCG CCT TTG CAG TAT AAC CCT GTG ACA AAG 
DVRGQVVN F A PL Q Y N P V T K> 

1500 

ACG TTG CGC ATC TAT ACG GAA ATC ACT GTG GCA GTG AGC GAA ACT TCG GAA CAA GGC 
TLRIYTEITVAVSETSEQG> 

1600 

• ♦ • * * 

AAA AAT ATT CTG AAC AAG AAA GGT ACA TTT GCC GGC TTT GAA GAC ACA TAC AAG CGC 
KN I LNKKG T F AG F EDTY KR> 

• * • • * 

ATG TTC ATG AAC TAC GAG CCG GGG CGT TAC ACA CCG GTA GAG GAA AAA CAA AAT GGT 
MFMNYEPGRYTPVEEKQN G> 

1700 

* • » • * * 

CGT ATG ATC GTC ATC GTA GCC AAA AAG TAT GAG GGA GAT ATT AAA GAT TTC GTT GAT 
RMIVI .VA K K Y EG D I KDFVD> 

TGG AAA AAC CAA CGC GGT CTC CGT ACC GAG GTG AAA GTG GCA GAA GAT ATT GCT TCT 
WKNQRGLRT EVKVAEDI A S> 

1800 

CCC GTT ACA GCT AAT GCT ATT CAG CAG TTC GTT AAG CAA GAA TAC GAG AAA GAA GGT 
PVTANAIQQ FVKQEYEKE G> 

w * * 

AAT GAT TTG ACC TAT GTT CTT TTG GTT GGC GAT CAC AAA GAT ATT CCT GCC AAA ATT 
NDLTYVLLVGDHKDIPAKI> 

1900 

• * » • » * 

ACT CCG GGG ATC AAA TCC GAC CAG GTA TAT GGA CAA ATA GTA GGT AAT GAC CAC TAC 
TPGIKSDQVYGQ IVGNDK Y> 

2000 

AAC GAA GTC TTC ATC GGT CGT TTC TCA TGT GAG AGC AAA GAG GAT CTG AAG ACA CAA 
NEVFIGRFS CES KEDLKT Q> 

XTlal 

I 

j » » • • * 

ATC GAT CGG ACT ATT CAC TAT GAG CGC AAT ATA ACC ACG GAA GAC AAA TGG CTC GGT 
IDRTIHY ERNITTEDKWL G> 
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Table 2 (contd.) 30 



2100 

■» • » • * » 

CAG GCT CTT TGT ATT GCT TCG GCT GAA GGA GGC CCA TCC GCA GAC AAT GGT GAA ACT 
Q A L C I AS A E G G P S ADNG £ S> 

>EcoR5 
I 

| * • - • » * « 

GAT ATC CAG CAT GAG AAT GTA ATC GCC AAT CTG CTT ACC CAG TAT GGC TAT ACC AAG 
DI QHENVI ANLLTQYGYT K> 

2200 

• • w » » 

ATT ATC AAA TGT TAT GAT CCG GGA GTA ACT CCT AAA AAC ATT ATT GAT GCT TTC AAC 
I I KCYDPGVT P K N I I D A F N> 



GGA GGA ATC TCG TTG GTC AAC TAT ACG GGC CAC GGT AGC GAA ACA GCT TGG GGT ACG 
G G I S LVNY TG H G S E T A W G T> 

2300 

• * » • * • 

TCT CAC TTC GGC ACC ACT CAT GTG AAG CAG CTT ACC AAC AGC AAC CAG CTA CCG TTT 
SHFGTTHVKQLT.NSNQL,PF> 

>Sphl 
I 

! 2400 

* • * • | • w 

ATT TTC GAC GTA GCT TGT GTG AAT GGC GAT TTC CTA TTC AGC ATG CCT TGC TTC GCA 
I F CVACVN G D F L, F S M ? C FA> 



GAA GCC CTG ATG CGT GCA CAA AAA GAT GGT AAG CCG ACA GGT ACT GTT GCT ATC ATA 
E A L» M R A O K -Z> G K P T GTVA I I> 

2500 

• « • * * * 

GCG TCT ACG ATC AAC CAG TCT TGG GCT TCT CCT ATG CGC GGG CAG GAT GAG ATG AAC 
ASTINQSWAS PMRGQDEM N> 



GAA ATT CTG TGC GAA AAA CAC CCG AAC AAC ATC AAG CGT ACT TTC GGT GGT GTC ACC 
EI LCEKHPNN I KRTFGGV T> 

2600 

ATG AAC GGT ATG TTT GCT ATG GTG GAA AAG TAT AAA AAG GAT GGT GAG AAG ATG CTC 
MNGMFAMVEK.YKKDGEKM L> 



GAC ACA TGG ACT GTT TTC GGC GAC CCC TCG CTG CTC GTT CGT ACA CTT GTC CCG ACC 
DTWTVFGD P S LLVRTLV PT> 

2700 

• •* WW* <. 

AAA ATG CAG GTT ACG GCT CCG GCT CAG ATT AAT TTG ACG GAT GCT TCA GTC AAC GTA 
KM Q V TA PA Q I NL T DA SVN V> 



TCT TGC GAT TAT AAT GGT GCT ATT GCT ACC ATT TCA GCC AAT GGA AAG ATG TTC GGT 
SCDYNGAI AT I SANGKMFG> 

>Pstl 
I 

2800 | 

j * • • m 

TCT GCA GTT GTC GAA AAT GGA ACA CCT ACA ATC AAT CTG ACA GGT CTG ACA AAT GAA 
S'A V V E K G T A T I N h T 'G-L T N E> 

2900 
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AGC ACG CTT ACC CTT ACA GTA GTT GGT TAC AAC AAA GAG ACG GTT ATT AAG ACC ATC 
STLTLTVVGYNKETVIKTI> 

AAC ACT AAT GGT GAG CCT AAC CCC TAC CAG CCC GTT TCC AAC TTG ACA GCT ACA ACG 
NTNGEPNPYQ pVSNLTAT T> 

3000 

CAG GGT CAG AAA GTA ACG CTC AAG TGG GAT GCA CCG AGC ACG AAA ACC AAT GCA ACC 
Q GQKVTLKWDAPSTKTNAT> 

ACT AAT ACC GCT CGC AGC GTG GAT GGC ATA CGA GAA TTG GTT CTT CTG TCA GTC AGC 
TNTARSVDGIRELVLLSVS> 

3100 

GAT GCC CCC GAA CTT CTT CGC AGC GGT CAG GCC GAG ATT GTT CTT GAA GCT CAC GAT 
DA PELLRSGQAEIVLEA HD> 

>BamHl 

GTT TGG AAT GAT GGA TCC GGT TAT CAG ATT CTT TTG GAT GCA GAC CAT GAT CAA TAT 
VWNDGS G Y Q I LLDA DKDQY> 

3200 ^ 

GGA CAG GTT ATA CCC AGT GAT ACC CAT ACT CTT TGG CCG AAC TGT AGT GTC CCG GCC 
GQ VIPSDTHTLWPNCSVPA> 

3300 

AAT CTG TTC GCT CCG TTC GAA TAT ACT GTT CCG GAA AAT GCA GAT CCT TCT TGT TCC 
NL FA?FEYTVPENADPSCS> 

CCT ACC AAT ATG ATA ATG GAT GGT ACT GCA TCC GTT AAT ATA CCG GCC GGA ACT TAT 
PTNMIMDGTASVNIPAGTY> 

3400 

• * * * * 

GAC TTT GCA ATT GCT GCT CCT CAA GCA AAT GCA AAG ATT TGG ATT GCC GGA CAA GGA 
DFAIAAPQANAKIWIAGQG> 

CCG ACG AAA GAA GAT GAT TAT GTA TTT GAA GCC GGT AAA AAA TAC CAT TTC CTT ATG 
PT K EDDYVF EAGKKYKFLM> 

3500 w 

AAG AAG ATG GGT AGC GGT GAT GGA ACT GAA TTG ACT ATA AGC GAA GGT GGT GGA AGC 
KKMGSGDGTELTISEGGGS> 

GAT TAC ACC TAT ACT GTC TAT CGT GAC GGC ACG AAG ATC AAG GAA GGT CTG ACG GCT 
DY TYTVYRDGTKIKEGLTA> 

3600 

ACG ACA TTC GAA GAA GAC GGT GTA GCT ACG GGC AAT CAT GAG TAT TGC GTG GAA GTT 
TTFE EDGVATGNHEYCVEV> 

>BamHl 
I 

3700 | 
I 

AAG TAC ACA GCC GGC GTA TCT CCG AAG GTA TGT AAA GAC GTT ACG GTA GAA GGA TCC 
KYTAGVSPKVCKDVTVEGS> 

AAT GAA TTT GCT CCT GTA CAG AAC CTG ACC GGT AGT GCA GTC GGC CAG AAA GTA ACG 

RECTIFIED SHEET (RULE 91) 
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NE FA?VQNLTGSAVG-QKVT> 

>Asp718 
I 

| 3800 

T f ? T 7 " T V " 7 7 7 7 7 7 7 7 c? " ! 
fT??"??? T77 74 T< T77 7777 

>Clal 

! 3900 

f T7 7 7 7 7 7 T 7 7 7 7 T 7 7 7 7 

w * 

GCT GGC TAG aIt AGC AAT GGT TGT GTA TAT itt GAG TCA TTC GGT CTT GGT GGT ATA 
AGYNSN GCVYSESFGLGGI> 

4000 _ 



GGA CTT CTT ACC CCT GAC AAC TAT CTG ATA ACA CCG GCA TTG GAT TTG CCT AAC GGA 
GV LTPDNYLITPALDLPN^> 

4100 

CCC CTG TAT GCA TCT TCC ACC OCT AAC CAT CCA TCC «C TTC ACS AAT OCT TTC TTC 

. 4200 

GAA GAG ACG ATT ACG GCA AAA GGT GTT CGC TCG CCG GAA GCT ATT CGT GGT CGT ATA 
ES7 ITAKGVRSPSAIRGRX> 

CAG GGT ACT TGG CGC CAG AAG ACG GTA GAC CTT CCC GCA GGT ACG AAA TAT GTT GCT 
O GTWRQ KTVDL PACT KYVA> 



Q G 

4300 



TTC CGT CAC TTC CAA AGC ACG GAT ATG TTC TAC ATC GAC CTT GAT GAG GTT GAG ATC 
FRHFQS T DMFY I DLu fi. V r. i> 

aIg GCC AAC GGC AAG CGC GCA ^C TTC ACG gIa ACG TTC GAG TCT TCT ACT CAT GGA 
KANG KRADFTETr E S S T H G> 



>Clal 
I 



4400 ! 



GAG GCA CCG GCG GAA TGG ACT ACT ATC GAT GCC GAT GGC GAT GGT CAG GGT T«5 CTC 
EAPA EWTTlDADGDGQGWi-> 

4500 

TCT CTG TCT TCC GGA CAA TTG GAC TGG CTG ACA GCT CAT CCC GCC ACC AAC GTA CTA 

GCC TCT TTC TCA TGG AAT GGA ATG GCT TTG AAT CCT GAT AAC TAT CTC ATC TCA AAG 
AS ? SWNGMA LN.P DNV L I SK> 



4600 
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GAT GTT ACA GGC GCA ACG AAG GTA AAG TAC TAG TAT GCA GTC AAC GAC GGT TTT CCC 
DVTGATKVKYYYAVNOGFP> 



GGG GAT CAC TAT GCG GTG ATG ATC TCC AAG ACG GGC ACG AAC GCC GGA GAC TTC ACG 
G DH Y A V M I SKTGTNAGDF T> 

4700 

♦ » » • 

GTT GTT TTC GAA GAA ACG CCT AAC GGA ATA AAT AAG GGC GGA GCA AGA TTC GGT CTT 
V V F EST PNG I NKGGARFGL> 



TCC ACG GAA GCC AAT GGC GCC AAA CCT CAA AGT GTA TGG ATC GAG CGT ACG GTA GAT 
S TEANGAKPQSVWI ERTVD> 

4800 

TTG CCT GCG GGC ACG AAG TAT GTT GCT TTC CGT CAC TAC AAT TGC TCG GAT TTG AAC 
LPA ,GTKYVAFRHYNCSDLN> 



>Ncol 

I 

1 4900 
♦ » * » » 

TAC ATT CTT TTG GAT GAT ATT CAG TTC ACC ATG GGT GGC AGC CCC ACC CCG ACC GAT 
YILLDDIQFTMGGSPT?TD> 



TAT ACC TAC ACG GTG TAT CGT GAC GGT ACG AAG ATC AAG GAA GGT CTG ACC GAA ACG 
Y T Y T V Y RDGTK I KEGLTE T> 

5000 

♦ • * • • • 

ACC TTC GAA GAA GAC GGC GTA GCT ACA GGC AAT CAT GAG TAT TGC GTG GAA GTG. AAG 
TFEEDGVATGNHEYCVEVK> 



TAC ACA GCC GGC GTA TCT CCG AAA GAG TGC GTA AAC GTA ACT ATT AAT CCG ACT CAG 
YTAGVS PKECVNVTINPTQ> 

5100 

W « W W • 9 

TTC AAT CCT GTA AAG AAC CTG AAG GCA CAA CCG GAT GGC GGC GAC GTG GTT CTC AAG 
FNPVKNL KAQPDGGDVVLK> 



TGG GAA GCC CCG AGC GCA AAA AAG ACA GAA GGT TCT CGT GAA GTA AAA CGG ATC GGA 
WEAPSAKKTEG SREVKRIG> 

5200 

• * * * • » 

GAC GGT CTT TTC GTT ACG ATC GAA CCT GCA AAC GAT GTA CGT GCC AAC GAA GCC AAG 
DG .LFVT IEPANDVRANEAK> 

5300 

• • • * • 

GTT GTG CTC GCA GCA GAC AAC GTA TGG GGA GAC AAT ACG GGT TAC CAG TTC TTG TTG 
VV L A A D NVWG DNTG YQ F LL> 



GAT GCC GAT CAC AAT ACA TTC GGA AGT GTC ATT CCG GCA ACC GGT CCT CTC TTT ACC 
DADHNTFGSVIPATGPLFT> 

5400 

» « » * * • 

GGA ACA GCT TCT TCC AAT CTT TAC AGT GCG AAC TTC GAG TAT TTG ATC CCG GCC AAT 
GTASSNLYSANFEYL I PAN> 



GCC GAT CCT GTT GTT ACT ACA CAG AAT ATT ATC GTT ACA GGA CAG GGT GAA GTT GTA 
ADPVVTTQNI IVTGQGEVV> 
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5500 

• • • * * 

ATC CCC GGT GGT GTT TAC GAC TAT TGC ATT ACG AAC CCG GAA CCT GCA TCC GGA AAG 
X p G G V Y D Y C I T K P E P ' A S G K> 

ATG TGG ATC GCA GGA GAT GGA GGC AAC CAG CCT GCA CGT TAT GAC GAT TTC ACA TTC 
MWIAG DGGKQPAR YDDFTF> 

5600 

GAA GCA GGC AAG AAG TAC ACC TTC ACG ATG CGT CGC GCC GGA ATG GGA CAT GGA ACT 
EAGKKYTFTMRRAGMGDGT> 

5700 

w 

GAT ATG GAA GTC GAA GAC GAT TCA CCT GCA AGC TAT ACC TAT ACA GTC TAT CGT GAC 
DM EVEDDSPASYTYTVYRD> 



GGC 
G 


* 

ACG 
T 


AAG 

K 


ATC 
I 


• 

AAG 
K 


GAA GGT 
E G 


* 

CTG ACC 
h T 


GAA 
E 


ACG 
T 


ACC 
T 


TAC 
Y 


CGC 
R 


* 

GAT 
D 


GCA 
A 


GGA 
G 


* 

ATG 
M 


AGT 
S> 






• 






m 






• 






5800 
• 






* 






•w 


GCA 
A 


CAA 
Q 


TCT 
5 


CAT 
H 


GAG 
E 


TAT 
Y 


TGC 
C 


GTA GAG 
V E 


GTT 

V 


AAG 
K 


TAC 
Y 


GCA 
A 


GCC 
A 


GGC 
G 


GTA 
V 


TCT 
S 


CCG 
P 


AAG 
K> 


GTT 
V 


TGT 
C 


GTG 
V 


♦ 

GAT 
D 


•TAT 
Y 


ATT 
I 


• 

CCT 
P 


GAC 
D 


GGA 
G 


• 

GTG 
V 


GCA 
A 


GAC 
D 


GTA 
V 


♦ 

ACG 
T 


GCT 
A 


CAG 
Q 


« 

AAG 

K 


CCT 
P 


TAC 
Y> 










* 




5900 
* 






* 








• 






* 




ACG 
T 


CTG 
L 


ACA 
T 


GTT 
V 


GTT 

' V 


GGA 
G 


AAG 
K 


ACG 
T 


ATC 
I 


ACG 
T 


GTA 
V 


ACT 
T 


TGC 
C 


CAA 
Q 


GGC 
G 


GAA 
S 


GCT 
A 


ATG 
M 


ATC 
I> 


TAC 
Y 


* 

GAC 
D 


ATG 
M 


AAC 
N 


GGT 
G 


CGT 
R 


CGT 
R 


CTG 
L 


* 

GCA 
A 


GCC 
A 


GGT 
G 


* 

CGC 
R 


AAC 


ACA 
T 


GTT 
V 


GTT 
V 


TAC 
Y 


ACG 
T 


GCT 
A> 




6000 




















• 














CAG 
0 


GGC 
G 


GGC 
G 


TAC 

Y 


TAT 
Y 


GCA 
A 


GTC 
V 


ATG 
M 


GTT 
V 


GTC 
V 


GTT 
V 


GAC 
D 


GGC 
G 


AAG 

K 


TCT 
S 


TAC 
Y 


GTA 
V 


GAG 
E 


AAA 

K> 



6100 

* * • * * 

CTC GCT GTA AAG TAA TTCTGTC TTGGACTCGG AGACTTTGTG CAGACACTTT TAATATAGGT 
L A V K •> 

>Clal 
I 

• | • * • * 

CTGTAATTGT CTCAGAGTAT GAATCGATCG CCCGACCTCC TTTTAAGGAA GTCTGGGCGA CTTCGTTTTT 

6200 

ATGCCTATTA TTCTAATATA CTTCTGAAAC A ATTTGTT CC AAAAAGTTGC ATGAAAAGAT TATCTTACTA 

6300 

TCTTTGCACT GCAAAAGGGG AGTTTCCTAA GGTTTTCCCC GGAGTAGTAC GGTAATAACG GTGTGGTAGT 
>Pvu2 

TCAGCTGGTT AGAATACCTG CCTGTCACGC AGGGGGTCGC GGGTTCGAGT CCCCTCCATA CCGCTAAATA 
6400 

GCTGAAAGAT AGCCTATAGG TCATCTGAAG CAATTTTAGA AACGAATCCA AAAGCGTCTT AATTCCAACG 
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6500 

AATTAAGGCG CTTTTTCTTT GTCGCCACCC CACACGTCGG ATGAGGTTCG GAATAGGCGT ATATTCCGTA 

6600 

• * » • * • * 

AATATGCCTC CGGTGGTTCC ATTTTGGTTA CAAAAAACAA AGGGGCTGAA AATTGTAACC ACAGACGACG 

>Ndel 
I 

» * • j • n 

TTAAGACGAT GTTTAGACGA TTGACAAATT ACTCTGTTTC AAAATCATAT CTCGAACTTT GTAGCCGTAT 

6700 

• * » • • » m 

GGTTACACTA ATTTTGGAGC AAAATGAAGA GTCAATTTCG TTCAGTTTTT TACTTGCGCA GCAATTACAT 

6800 

• * • ♦ * w m 

CAACAAAGAA GGTAAAACTC CTGTCCTTAT TCGTATTTAT CTGAATAAGG AACGCCTGTC GTTGGGTTCG 

» • » * • * 9 

ACAGGGCTGG CTGTTAATCC CATACAATGG GATTCAGAAA AAGAGAAAGT CAAAGGACAT AGTGCAGAAG 
6900 

CACTTGAAGT CAATCGAAAG ATCGAAGAAA TCAGGGCTGA TATTCTGACC ATTTACAAAC GTTTGGAAGT 

7000 

» • • » » * w 

AACAGTAGAT GATTTGACGC CGGAGAGGAT CAAATCGGAA TACTGCGGAC AGACGGATAC ATTAAACAGT 

• • • * w * * 

ATAGTGGAAC TTTTCGATAA ACATAACGAG GATGTCCGGG CCCAGGTGGG AATCAATAAA ACGGCTGCCA 
7100 

• * • * • * • 

CTTTACAAAA ATACGAAAAC AGCAAACGGC ATTTTACCCG ATTCCTCAAA GCGAAGTACA ACAGAACGGA 

7200 

• • * * • * V 

TCTCAAATTC TCAGAGCTTA CCCCGTTGGT CATTCATAAC TTTGAGATAT ATCTGCTGAC TGTAGCCCAT 

>Hind3 
I . 
• I 

TGTTGCCCGA ATACGGCAAC CAAAATCTTG AAGCTT 



BN5MCID: ^WQ S»72fifiA1_l_? 



WO 95/07286 



PCT/US94/10283 



36 

Cleavage of the precursor protein after the Arg residue at 
amino acid 227 removes the N-terminal precursor portion and after 
the Arg residue at amino acid 719 , 1091 and 1429 releases a low 
molecular weight Arg-gingipain and three hemagglutinin 
5 components. The 44 kDa hemagglutin component has an amino acid 
sequence as given in SEQ ID NO: 11 from 720-1091, with calculated 
molecular weight of 39.4 kDa, consistent with that estimated by 
gel electrophoresis. The 17 kDa hemagglutinin component has an 
amino acid sequence as given in SEQ ID NO: 11 at amino acids 1092- 
10 14 29, and a calculated molecular weight of 37.1 kDa. The 2 7 kDa 
hemagglutinin component has an amino acid sequence extending from 
amino acids 1430-1704 in SEQ ID NO: 11, and a calculated molecular 
weight of 29.6 kDa. 

15 
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Table 3 is the result of sequence comparison of the 44 kDa, 
27 kDa and 17 kDa hemagglutinin domains of Arg-gingipain 
complexes, alignment of regions of amino acid identity , which 
without wishing to be bound by any particular theory, are 
5 postulated to be the domains responsible for hemagglutinin 
activity. Identical amino acids among all hemagglutinin domains 
are in capital letters, and amino acids which are not conserved 
are shown in lower case letters. In the case of the proteolytic 
component, only a limited region with significant match is shown. 

10 

A genomic DNA library was also prepared from virulent P. 
ginaivalis W50. Two clones were identified as containing Arg- 
gingipain coding sequence. 0.5 and 3 . 5 kb BajnHI fragments were 
sequenced; it exhibited 99% nucleotide sequence identity with 
15 about 3160 plus 557 bp of ginaivalis H66 DNA containing Arg- 
gingipain coding sequence. A comparison of the deduced amino 
acid sequences of the encoded Arg-gingipain sequences revealed 
99% identity. 

2 0 Tables 1 and 2 both represent sequences from ginaivalis . 

However, it is understood that there will be some variations in 
the amino acid sequences and encoding nucleic acid sequences for 
Arg-gingipain from different ginaivalis strains. The ordinary 
skilled artisan can readily identify and isolate Arg-gingipain- 

25 encoding sequences from other strains where there is at least 70% 
homology to the specifically exemplified sequences herein using 
the sequences provided herein taken with what is well known to 
the art. Also within the scope of the present invention are Arg- 
gingipain where the protease or proteolytic component has at 

30 least about 85% amino acid sequence identity with an amino acid 
sequence exemplified herein. 

It is also understood by the skilled artisan that there can 
be limited numbers of amino acid substitutions in a protein 
35 without significantly affecting function, and that nonexemplif ied 
gingipain-1 proteins can have some amino acid sequence diversion 
from the exemplified amino acid sequence. Such naturally 



ISDOCID: <WQ 9S07266A1 I > 



PCI7US94/10283 
39 

occurring variants can be identified, e.g., by hybridization to 
. the exemplified (mature) Arg-gingipain-2 coding sequence (or a 
portion thereof capable of specific hybridization to Arg- 
gingipain sequences) under conditions appropriate to detect at 
5 least about 70% nucleotide sequence homology, preferably about 
80%, more preferably about 90% and most preferably 95-10Q% 
sequence homology. Preferably the encoded Arg-gingipain protease 
or proteolytic component has at least about 85% amino acid 
sequence identity to an exemplified Arg-gingipain amino acid 
10 sequence. . . 

It is well known in the biological arts that certain amino 
acid substitutions can be made in protein sequences without 
affecting the function of the protein. Generally, conservative 

15 amino acids are tolerated without affecting protein function. 

Similar amino acids can be those that are similar in size and/or 
charge properties, for example, aspartate and glutamate and 
isoleucine and valine are both pairs of similar amino acids. 
Similarity between amino acid pairs has been assessed in the art 

20 in a number of ways. For example, Dayhoff et al. (1978) in Atlas: 
of Protein Sequence and Structure, Volume 5, Supplement 3, 
Chapter 22, pages 345-352, which is incorporated by reference 
herein, provides frequency tables for amino acid substitutions 
which can be employed as a measure of amino acid similarity. 

25 Dayhoff et al.'s frequency tables are based on comparisons of 
amino acid sequences for proteins having the same function from 
a variety of evolutionarily different sources. 

The skilled artisan recognizes that other gingivalis 
30 strains can have coding sequences for a protein with the 
distinguishing characteristics of an Arg-gingipain; those coding 
sequences may be identical to or synonymous with the exemplified 
coding sequence, or there may be some variation (s) in the encoded 
amino acid sequence. An Arg-gingipain coding sequence from a 
35 gingivalis strain other than H66 can be identified by, e.g. 
hybridization to a polynucleotide or an oligonucleotide having 
the whole or a portion of the exemplified coding sequence for 
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mature gingipain, under stringency conditions appropriate to 
detect a sequence of at least 70% homology. 

A polynucleotide or fragment thereof is "substantially 
5 homologous" (or "substantially similar") to another 
polynucleotide if, when optimally aligned (with appropriate 
nucleotide insertions or deletions) with another polynucleotide , 
there is nucleotide sequence identity for approximately 60% of 
the nucleotide bases, usually approximately 70%, more usually 
10 about 8 0%, preferably about 90%, and more preferably about 95% 
to 100% of the nucleotide bases. 

Alternatively, substantial homology (or similarity) exists 
when a polynucleotide or fragment thereof will hybridize to 
15 another under polynucleotide under selective hybridization 
conditions. Selectivity of hybridization exists under 

hybridization conditions which allow one to distinguish the 
target polynucleotide of interest from other polynucleotides. 
Typically, selective hybridization will occur when there is 

2 0 approximately 55% similarity over a stretch of about 14 

nucleotides, preferably approximately 65%, more preferably 
approximately 75%, and most preferably approximately 90%. See 
Kanehisa (1984) Nuc. Acids Res., 12:203-213. The length of 
homology comparison, as described, may be over longer stretches, 
25 and in certain embodiments will often be over a stretch of about 
17 to 2 0 nucleotides, and preferably about 36 or more 
nucleotides. 

The hybridization of polynucleotides is affected by such 

3 0 conditions as salt concentration, temperature, or organic 

solvents, in addition to the base composition, length of the 
complementary strands, and the number of nucleotide base 
mismatches between the hybridizing polynucleotides, as will be 
readily appreciated by those skilled in the art. Stringent 
35 temperature conditions will generally include temperatures in 
excess of 30°C, typically in excess of 37 °C, and preferably in 
excess of 45 °C. Stringent salt conditions will ordinarily be 
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less than 1 M, typically less than 500 mM, and preferably less 
than 2 00 mM. However, the combination of parameters is much more 
important than the measure of any single parameter (Wetmur and 
Davidson (1968) J. Mol . Biol. 31, 349-370). 

An "isolated" or "substantially pure" polynucleotide is a 
polynucleotide which is substantially separated from other 
polynucleotide sequences which naturally accompany a native 
gingipain-1 sequence. The term embraces a polynucleotide 
sequence which has been removed from its naturally occurring 
environment, and includes recombinant or cloned DNA isolates, 
chemically synthesized analogues and analogues biologically 
synthesized by heterologous systems. 

A polynucleotide is said to "encode" a polypeptide if, in 
its native state or when manipulated by methods known to those 
skilled in the art, it can be transcribed and/or translated to 
produce the polypeptide of a fragment thereof. The anti-sense 
strand of such a polynucleotide is also said to encode the 
sequence. 

A nucleotide sequence is operably linked when it is placed 
into a functional relationship with another nucleotide sequence. 
For instance, a promoter is operably linked to a coding sequence 
if the promoter affects its transcription or expression. 
Generally, operably linked means that the sequences being linked 
are contiguous and, where necessary to join two protein coding 
regions, contiguous and in reading frame. However, it is well 
known that certain genetic elements, such as enhancers, may be 
operably linked even at a distance, i.e., even if not contiguous. 

The term "recombinant" polynucleotide refers to a 
polynucleotide which is made by the combination of two otherwise 
separated segments of sequence accomplished by the artificial 
manipulation of isolated segments of polynucleotides by genetic 
engineering techniques or by chemical synthesis. In so doing one 
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may join together polynucleotide segments of desired functions 
to generate a desired combination of functions. 

Polynucleotide probes include an isolated polynucleotide 
attached to a label or reporter molecule and may be used to 
identify and isolate other Arg-gingipain coding sequences. 
Probes comprising synthetic oligonucleotides or other 
polynucleotides may be derived from naturally occurring or 
recombinant single or double stranded nucleic acids or be 
chemically synthesized. Polynucleotide probes may be labelled 
by any of the methods known in the art, e.g., random hexamer 
labeling, nick translation, or the Klenow fill-in reaction. 

Large amounts of the polynucleotides may be produced by 
replication in a suitable host cell. Natural or synthetic DNA 
fragments coding for a proteinase or a fragment thereof will be 
incorporated into recombinant polynucleotide constructs, 
typically DNA constructs, capable of introduction into and 
replication in a prokaryotic or eukaryotic cell. Usually the 
construct will be suitable for replication in a unicellular host, 
such as yeast or bacteria, but a multicellular eukaryotic host 
may also be appropriate, with or without integration within the 
genome of the host cells. Commonly used prokaryotic hosts 
include strains of Escherichia coli . although other prokaryotes, 
such as Bacillus subtil is or Pseudomonas may also be used. 
Mammalian or other eukaryotic host cells include yeast, 
filamentous fungi, plant, insect, amphibian and avian species. 
Such factors as ease of manipulation, ability to appropriately 
glycosylate expressed proteins, degree and control of protein 
expression, ease of purification of expressed proteins away from 
cellular contaminants or other factors may determine the choice 
of the host cell. 

The polynucleotides may also be produced by chemical 
synthesis, e.g., by the phosphoramidite method described by 
Beaucage and Caruthers (1981) Tetra . Letts . > 22: 1859-1862 or the 
triester method according to Matteuci et al. (1981) J. Am. Chem. 
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Soc, 103 ; 3185, and may be performed on commercial automated 
oligonucleotide synthesizers, A double-stranded fragment may be 
obtained from the single stranded product of chemical synthesis 
either by synthesizing the complementary strand and annealing the 
5 strand together under appropriate conditions or by adding the 
complementary strand using DNA polymerase with an appropriate 
primer sequence . 

DNA constructs prepared for introduction into a prokaryotic 
10 or eukaryotic host will typically comprise a replication system 
(i.e. vector) recognized by the host, including the intended DNA 
fragment encoding the desired polypeptide, and will preferably 
also include transcription and translational initiation 
regulatory sequences operably linked to the polypeptide-encoding 
15 segment. Expression systems (expression vectors) may include, 
for example, an origin of replication or autonomously replicating 
sequence (ARS) and expression control sequences, a promoter, an 
enhancer and necessary processing information sites, such as 
ribosome-binding sites, RNA splice sites, polyadenylation sites, 
20 transcriptional terminator sequences, and mRNA stabilizing 
sequences. Signal peptides may also be included where 
appropriate from secreted polypeptides of the same or related 
species, which allow the protein to cross and/or lodge in cell 
membranes or be secreted from the cell. 

.25 

An appropriate promoter and other necessary vector sequences 
will be selected so as to be functional in the host. Examples 
of workable combinations of cell lines and expression vectors are 
described in Sambrook et al. (1989) vide infra ; Ausubel et al. 

30 (Eds.) (1987) Current Protocols in Molecular Biology . Greene 
Publishing and Wiley Interscience, New York; and Metzger et al. 
(1988) Nature, 334 : 31-3 6. Many useful vectors for expression 
in bacteria, yeast, mammalian, insect, plant or other cells are 
well known in the art and may be obtained such vendors as 

3 5 Stratagene, New England Biolabs, Promega Biotech, and others. 

In addition, the construct may be joined to an amplifiable gene 
(e.g., DHFR) so that multiple copies of the gene may be made. 
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For appropriate enhancer and other expression control sequences, 
see also Enhancers and Eukarvotic Gene Expression , Cold Spring 
Harbor Press, N.Y. (198 3) . While such expression vectors may 
replicate autonomously, they may less preferably replicate by 
being inserted into the genome of the host cell. 

Expression and cloning vectors will likely contain a 
selectable marker, that is, a gene encoding a protein necessary 
for the survival or growth of a host cell transformed with the 
vector. Although such a marker gene may be carried on another 
polynucleotide sequence co-introduced into the host cell, it is 
most often contained on the cloning vector. Only those host 
cells into which the marker gene has been introduced will survive 
and/or grow under selective conditions. Typical selection genes 
encode proteins that (a) confer resistance to antibiotics or 
other toxic substances, e.g., ampicillin, neomycin, methotrexate, 
etc.; (b) complement auxotrophic deficiencies; or (c) supply 
critical nutrients not available from complex media. The choice 
of the proper selectable marker will depend on the host cell; 
appropriate markers for different hosts are known in the art. 

The recombinant vectors containing the Arg-gingipain coding 
sequences of interest can be introduced (transformed, 
transfected) into the host cell by any of a number of appropriate 
means, including electroporation; transformation or transfection 
employing calcium chloride, rubidium chloride, calcium phosphate, 
DEAE-dextran, or other substances; micropro jectile bombardment ; 
lipofection; and transfection or infection (where the vector is 
an infectious agent, such as a viral or retroviral genome) . The 
choice of such means will often depend on the host cell. Large 
quantities of the polynucleotides and polypeptides of the present 
invention may be prepared by transforming suitable prokaryotic 
or eukaryotic host cells with gingipain-l-encoding 
polynucleotides of the present invention in compatible vectors 
or other expression vehicles and culturing such transformed host 
cells under conditions suitable to attain expression of the Arg- 
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gingipain-encoding gene. The Arg-gingipain may then be recovered 
from the host cell and purified. 

The coding sequence for the "mature" form of Arg-gingipain-2 
5 is expressed after PCR site-directed mutagenesis and cloning into 
an expression vector suitable for use in coli , for example. 
Exemplary expression vectors for coli and other host cells are 
given, for example in Sambrook et al. (1989), vide infra , and in 
Pouwels et al. (Eds.) (1986) Cloning Vectors , Elsevier Science 
10 Publishers, Amsterdam, the Netherlands. 

In order to eliminate leader sequences and precursor 
sequences at the 5' side of the coding sequence, a combination 
of restriction endonuclease cutting and site-directed mutagenesis 

15 via PCR using an oligonucleotide containing a desired restriction 
site for cloning (one not present in coding sequence) , a ribosome 
binding site, an translation initiation codon (ATG) and the 
codons for the first amino acids of the mature Arg-gingipain-2. 
The oligonucleotide for site-directed mutagenesis at the 3/ end 

20 of the coding sequence for mature gingipain-1 includes 
nucleotides encoding the car boxy terminal amino acids of mature 
gingipain-l, a translation termination codon (TAA, TGA or TAG) , 
and a second suitable restriction endonuclease recognition site 
not present in the remainder of the DNA sequence to be inserted 

25 into the expression vector. The site-directed mutagenesis 

strategy is similar to that of Boone et al. (1990) Proc. Natl. 
Acad. Sci. USA 87.: 2800-2804, as modified for use with PCR. 

In another embodiment, polyclonal and/or monoclonal 
30 antibodies capable of specifically binding to a proteinase or 
fragments thereof are provided. The term antibody is used to 
refer both to a homogenous molecular entity, or a mixture such 
as a serum product made up of a plurality of different molecular' 
entities. Monoclonal or polyclonal antibodies specifically 
35 reacting with the Arg-gingipains may be made by methods known in 
the art. See, e.g., Harlow and Lane (1988) Antibodies: A 
Laboratory Manual , Cold Spring Harbor Laboratories; Goding (1986) 
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Monoclonal Antibodies: Principles and Practice , 2d ed., Academic 
Press, New York; and Ausubel et al. (1987) supra . Also, 
recombinant immunoglobulins may be produced by methods known in 
the art, including but not limited to the methods described in 
U.S. Patent No. 4,816,567. Monoclonal antibodies with affinities 
of 10 8 M* 1 , preferably 10 9 to 10 10 or more are preferred. 

Antibodies specific for Arg-gingipains may be useful, for 
example, as probes for screening DNA expression libraries or for 
detecting the presence of Arg-gingipains in a test sample. 
Frequently, the polypeptides and antibodies will be labeled by 
joining, either covalently or noncovalently , a substance which 
provides a detectable signal. Suitable labels include but are 
not limited to radionuclides, enzymes, substrates, cof actors, 
inhibitors, fluorescent agents , chemiluminescent agents, magnetic 
particles and the like. United States Patents describing the use 
of such labels include but are not limited to Nos. 3,817,837; 
3,850,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149; and 
4,366,241. 

Antibodies specific for Arg-gingipain (s) and capable of 
inhibiting its- proteinase activity may be useful in treating 
animals, including man, suffering from periodontal disease. Such 
antibodies can be obtained by the methods described above and 
subsequently screening the Arg-gingipain-specif ic antibodies for 
their ability to inhibit proteinase activity. 

Compositions and immunogenic preparations including vaccine 
compositions comprising substantially purified recombinant Arg- 
gingipain(s) and a suitable carrier therefor are provided. 
Alternatively, hydrophilic regions of the proteolytic component 
or hemagglutinin component (s) of Arg-gingipain can be identified 
by the skilled artisan, and peptide antigens can be synthesized 
and conjugated to a suitable carrier protein (e.g. , bovine serum 
albumin or keyhole limpet hemocyanin) for use in vaccines or in 
raising antibody specific for Arg-gingipains. Immunogenic 
compositions are those which result in specific antibody 
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production when injected into a human or an animal. Such 
vaccines are useful , for example, in immunizing an animal, 
including humans, against inflammatory response and tissue damage 
caused by ginaivalis in periodontal disease. The vaccine 
5 preparations comprise an immunogenic amount of one or more Arg- 
gingipains or an immunogenic fragment (s) or subunit(s) thereof. 
Such vaccines may comprise one or more Arg-gingipain proteinases, 
or in combination with another protein or other immunogen. By 
"immunogenic amount" is meant an amount capable of eliciting the 
10 production of antibodies directed against Arg-gingipain (s) in an 
individual to which the vaccine has been administered. 

Immunogenic carriers may be used to enhance the 
immunogenicity of the proteinases. Such carriers include but are 

15 not limited to proteins and polysaccharides, liposomes, and 
bacterial cells and membranes. Protein carriers may be joined 
to the proteinases to form fusion proteins by recombinant or 
synthetic means or by chemical coupling. Useful carriers and 
means of coupling such carriers to polypeptide antigens are known 

2 0 in the art. 

The vaccines may be formulated by any of the means known in 
the art. Such vaccines are typically prepared as injectables, 
either as liquid solutions or suspensions. Solid forms suitable 
25 for solution in, or suspension in, liquid prior to injection may 
also be prepared. The preparation may also, for example, be 
emulsified, or the protein encapsulated in liposomes. 

The active immunogenic ingredients are often mixed with 
30 excipients or carriers which are pharraaceutically acceptable and 
compatible with the active ingredient. Suitable excipients 
include but are not limited to water, saline, dextrose, glycerol, 
ethanol, or the like and combinations thereof. The concentration 
of the immunogenic polypeptide in injectable formulations is 
35 usually in the range of 0.2 to 5 mg/ml. 
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In addition, if desired, the vaccines may contain minor 
amounts of auxiliary substances such as wetting or emulsifying 
agents, pH buffering agents, and/or adjuvants which enhance the 
effectiveness of the vaccine. Examples of adjuvants which may 
be effective include but are not limited to: aluminum hydroxide; 
N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP) ; N-acetyl- 
nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as 
nor-MDP) ; N-acetylmuramyl-L-alanyl-D-isoglutaminyl-li-alanine-2- 
( 1 ' -2 ' -dipalmitoy l-sn-glycero-3hydroxyphosphoryloxy ) -ethy lamine 
(CGP 19835A, referred to as MTP-PE) ; and RIBI, which contains 
three components extracted from bacteria, monophosphoryl lipid 
A, trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in 
a 2% sgualene/Tween 80 emulsion. The effectiveness of an 
adjuvant may be determined by measuring the amount of antibodies 
directed against the immunogen resulting from administration of 
the immunogen in vaccines which are also comprised of the various 
adjuvants. Such additional formulations and modes of 

administration as are known in the art may also be used. 

50 kDa Arg-gingipain or high molecular weight Arg-gingipain . 
and fragments thereof may be formulated into vaccines as neutral 
or .salt forms. Pharmaceutical ly acceptable salts include but are 
not limited to the acid addition salts (formed with free amino 
groups of the peptide) which are formed with inorganic acids, 
e.g., hydrochloric acid or phosphoric acids; and organic acids, 
e.g., acetic, oxalic, tartaric, or maleic acid. Salts formed 
with the free carboxyl groups may also be derived from inorganic 
bases, e.g., sodium, potassium, ammonium, calcium, or ferric 
hydroxides, and organic bases, e.g., isopropy lamine, 
trimethy lamine, 2-ethylamino-ethanol , histidine, and procaine. 

The vaccines are administered in a manner compatible with 
the dosage formulation, and in such amount as will be 
prophylactically and/or therapeutically effective. The quantity 
to be administered, which is generally in the range of about 100 
to 1,000 Mg of protein per dose, more generally in the range of 
about 5. to 500 /xg of protein per dose, depends on the subject to 
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be treated, the capacity of the individual's immune system to 
synthesize antibodies, and the degree of protection desired. 
Precise amounts of the active ingredient required to be 
administered may depend on the judgment of the physician or 
5 doctor of dental medicine and may be peculiar to each individual, 
but such a determination is within the skill of such a 
practitioner . 

The vaccine or other immunogenic composition may be given 
10 in a single dose or multiple dose schedule. A multiple dose 
schedule is one in which a primary course of vaccination may 
include 1 to 10 or more separate doses, followed by other doses 
administered at subsequent time intervals as required to maintain 
and or reinforce the immune response, e.g., at 1 to 4 months: for 
15 a second dose, and if needed, a subsequent dose(s) after several 
months . 

Recombinant Arg-gingipains are useful in methods' of 
identifying agents that modulate proteinase activity, e.g., by 

20 acting on the proteinase itself. One such method comprises; the 
steps of incubating Arg-gingipain-1 (or high molecular weight 
Arg-proteinase) with a putative therapeutic agent; determining 
the activity of the proteinase incubated with the agent; and 
comparing the activity obtained in step with the activity of a 

25 control sample of proteinase that has not been incubated with the 
agent. 

All references cited herein are hereby incorporated by 
reference in their entirety. 

30 

Except as noted hereafter, standard techniques for cloning, 
DNA isolation, amplification and purification, for enzymatic 
reactions involving DNA ligase, DNA polymerase, restriction' 
endonucleases and the like, and various separation techniques are 
35 those known and commonly employed by those skilled in the art. 
A number of standard techniques are described in Sambrook et al. 
(1989) Molecular Cloning . Second Edition, Cold Spring Harbor 
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Laboratory, Plainview, New York; Maniatis et al. (1982) Molecular 
Cloning, Cold Spring Harbor Laboratory, Plainview, New York; Wu 
(ed.) (1993) Meth. Enzymol. 218 , Part I; Wu (ed.) (1979) Meth 
Enzymol. 68; Wu et al. (eds.) (1983) Meth. Enzymol. 100 and 101 : 
Grossman and Moldave (eds.) Meth . Enzymol. 65; Miller (ed.) 
( 1972 ) Experim ents in Molecular Genetics . Cold spring Harbor 
Laboratory, Cold Spring Harbor, New York, Old Primrose (1981) 
Principles of Gene Manipulation . University of California Press, 
Berkeley; Schleif and Wensink (1982) Practical Methods in 
Molecular Biology; Glover (ed.) (1985) DNA Cloning Vol. I and II, 
IRL Press, Oxford, UK; Hames and Higgins (eds.) (1985) Nucleic 
Acid Hybridization . IRL Press, Oxford, UK; Setlow and Hollaender 
< 1979 ) Genetic Engineering: Principles and Methods . Vols. 1-4, 
Plenum Press, New York. Abbreviations and nomenclature, where 
employed, are deemed standard in the field and commonly used in 
professional journals such as those cited herein. 

The foregoing discussion and the following examples 
illustrate but are not intended to limit the invention. The 
skilled artisan will understand . that alternative methods may be 
used to implement the invention. 

EXAMPLES 

Example 1 Purification of Gingjpain Enzvmes 

Example l.i Bacterial Cultivation 

P^ gingivalis strains H66 (ATCC 33277) and W50 (ATCC 53978) 
(virulent) were used in these studies. Cells were grown in 500 
ml of broth containing 15.0 g Trypticase Soy Broth (Difco, 
Detroit, Michigan) , 2 . 5 g yeast extract, 2.5 mg hemin, 0.25 g 
cysteine, 0.05 g dithiothreitol, 0.5 mg menadione (all from Sigma 
Chemical Company, St. Louis, MO) anaerobically at 37°C for 48 hr 
in an atmosphere of 85% N 2 , 10% C0 2 , 5% H 2 . The entire 500 ml 
culture was used to inoculate 20 liters of the same medium, and 
the latter was incubated, in a fermentation tank at 37 °C for 48 
hr (to a final optical density of 1.8 at 650 nm) . " - 



PCI7US94/10283 
51 

Example 1.2 Purif ication of Low Molecular Weight: Arg-gingipain 
1200 ml cell-free supernatant was obtained from the 48 hr 
culture by centrif ugation at 18,000 x g for 3 0 min. at 4°C. 
Proteins in the supernatant were precipitated out by 9 0% 
5 saturation with ammonium sulfate. After 2 hr at 4°C, the 
suspension was centrif uged at 18,000 x g for 30 min. The 
resulting pellet was dissolved in 0.05 M sodium acetate buffer, 
pH 4.5, 0.15 NaCl, 5 mM CaCl 2 ; the solution was dialyzed against 
the same buffer overnight at 4°C, with three changes with a 

10 buffer: protein solution larger than 150:1. The dialysate was 
then centrifuged at 25,000 x g for 30 min., and the dark brown 
supernatant (26 ml) was then chromatographed over an agarose gel 
filtration column (5.0 x 150 cm; Sephadex G-150, Pharmacia, 
Piscataway, NJ) which had been pre-equilibrated with the same 

15 buffer. The column was developed with said buffer at a flow rate 
of 36 ml/hr. 6 ml fractions were collected and assayed for both 
amidolytic and proteolytic activities, using Bz-L-Arg-pNA and 
azocasein as substrates. Four peaks containing amidolytic 
activity were identified (Fig. 1) . The fractions corresponding 

20 to peak 4 were combined, concentrated by ultrafiltration (Amicon 
PM-10 membrane; Amicon, Beverly, MA ) and then dialyzed overnight 
against 0.05 Bis-Tris, 5 mM CaCl 2 , pH 6.0. The volume of the 
dialysate was 14 ml. 

25 The 14 ml dialysate from the previous step was then applied 

to a DEAE-cellulose (Whatman, Maidstone, England) column (1 x 10 
cm) equilibrated with 0.05 mM Bis-Tris, 5 mM CaCl 2 , pH 6.0. The 
column was then washed with an additional 100 ml of the same 
buffer.. About 75% of the amidolytic activity, but only about 50% 

30 of the protein, passed through the column. The column wash fluid 
was dialyzed against 0.05 M sodium acetate buffer containing 5 
mM CaCl 2 (pH 4.5). This 19 ml dialysate was applied to a Mono S 
FPLC column (Pharmacia LKB Biotechnology Inc., Piscataway, NJ) 
equilibrated with the same buffer. The column was washed with 

35 the starting buffer at a flow rate of 1.0 ml/min for 20 min. 
Bound proteins were eluted first with a linear NaCl gradient (0 
to 0.1 M) followed by a second linear NaCl gradient (0.1 to 0.25 
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M) , each gradient applied over a 25 min time period. Fractions 
were assayed for amidolytic activity using Bz-L-Arg-pNA. 
Fractions with activity were pooled and re-chromatographed using 
the same conditions. Although - not detectable by gel 

electrophoresis, trace contamination by a proteinase capable of 
cleaving after lysyl residues was sometimes observed. This 
contaminating activity was readily removed by applying the sample 
to an arginyl-agarose column (L-Arginyl-SEPHAROSE 4B) 
equilibrated with 0.025 M Tris-HCl, 5 mM CaCl 2 , 0.15 M NaCl, pH 
7.5. After washing with the same buffer, purified enzyme was 
eluted with 0.05 M sodium acetate buffer, 5 mM CaCl 2 , pH 4.5. 
Yields of gingipain-1 were markedly reduced by this step (about 
60%). 

Example 1,3 High Molecular Weight Arg-gingjpain Purification 

The culture supernatant (2,900 ml) was obtained by 
centrifugation of the whole culture (6,000 x g, 30 min, 4°C) . 
Chilled acetone (4,350 ml) was added to this fraction over a 
period of 15 min, with the temperature of the solution maintained 
below 0°C at all times, using an ice/ salt bath and this mixture 
was centrifuged (6,000 x g, 30 min, -15°C) . The precipitate was 
: dissolved in 290 ml ..of 20 mM Bis-Tris-HCl , 150 mM NaCl, 5 mM 
CaCl 2 , 0.02% (w/v) NaN 3 , pH 6.8 (Buffer A) , and dialyzed against 
Buffer A containing 1.5 mM 4 , 4 ' -Dithiodipyridine disulfide for 
4h, followed by 2 changes of buffer A overnight. The dialyzed 
fraction was centrifuged (27,000 x g, 30 min, 4°C), following 
which it was concentrated to 40 ml by ultrafiltration using an 
Amicon PM-10 membrane. This concentrated fraction was applied 
. to a Sephadex G-150 column (5 x 115 cm = 2260 ml; Pharmacia, 
Piscataway, NJ) which had previously been equilibrated with 
Buffer A, and the fractionation was carried out at 30 ml/h (1.5 
cm/h) . Fractions (9 ml) were assayed for activity against Bz-L- 
Arg-pNa and Z-L-Lys-pNa (Novabiochem; 0.5 mM) . ' Amidolytic 
activities for Bz-L-Arg-pNa (0.5 mM) or Z-L-\Lys-pNa were measured 
in 0.2 M Tris.Hcl, 1 mM CaCl 2 , 0.02% (w/v) NaN 3 , 10 mM L-cysteine, 
pH 7.6. General proteolytic activity was measured with azocasein 
(2% w/v) as described by Barrett and Kirschke (1981) Meth. 
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Enzymol . 8_0, 535-561 for cathepsin L. Three peaks with activity 
against the two substrates were found. The first (highest 
molecular weight) peak of activity was pooled, concentrated to 
60 ml using ultrafiltration and dialyzed overnight against two 
5 changes of 50 mM Tris-HCl, 1 mM CaCl 2 , 0.02% NaN 3 , pH 7.4 (Buffer 
B). 

This high MW fraction was applied to an L-Arginine-Sepharose 
column (1.5 x 30 cm = 50 ml), which had previously been 

10 equilibrated with Buffer B at a flow rate of 20 ml/hr (11.3 
cm/h) , following which the column was washed with two column 
volumes of Buffer B. Following this, a step gradient of 500 mM 
NaCl was applied in Buffer B and the column was washed with this 
concentration of NaCl until the & 2 m baseline fell to zero. After 

15 re-equilibration of the column in Buffer B, a gradient from 0-750 
mM L-Lysine was applied in a total volume of 300 ml, followed by 
100 ml of 750 mM L-Lysine. The column was once again- re- 
equilibrated with Buffer B and a further gradient to 100 mM L- 
arginine in 300 ml was applied in the same way. Fractions (6 ml) 

20 from the Arg wash were assayed for activity against the two 
substrates as described previously. The arginine gradient 
eluted a major peak for an enzyme degrading Bz-L-Arg-pNa . \ The 
active fractions were pooled and dialyzed against two changes of 
20 mM Bis-Tris-HCl, 1 mM CaCl 2 , 0.02% (v/w) NaN 3 , pH 6.4 (Buffer 

25 C) and concentrated down to 10 ml using an Amicon PM-10 membrane. 

The concentrate with activity for cleaving Bz— L— Arg-pNa was 
applied to a Mono Q FPLC column (Pharmacia LKB Biotechnology Inc, 
Piscataway, NJ) equilibrated in Buffer C, the column was washed 
30 with 5 column volumes of Buffer C at 1.0 ml/min, following which 
bound protein was eluted with a 3 step gradient [0-2 00 mM NaCl 
(10 min) , followed by 200-250 mM NaCl (15 min) and 250-500 mM 
NaCl (5 min)]. The active fractions from Mono Q were pooled and 
used for further analyses. 
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Example 2 Molecular Weight Determination 

The molecular weight of the purified Arg-gingipain-1 was 
estimated by gel filtration on a Superose 12 column (Pharmacia, 
Piscataway, NJ) and by Tricine-SDS polyacrylamide gel 



inactivate the protease prior to boiling, thus preventing 
autoproteo lytic digestion. 

Example 3 Enzvme Assays 

Amidolytic activities of gingivalis proteinases were 

measured with the substrates MeO-Suc-Ala-Ala-Pro-Val-pNA at a 
concentration of 0.5 mM, Suc-Ala-Ala-Ala-pNA (0.5 mM) , Suc-Ala- 
Ala-Pro-Phe-pNA (0.5 mM), Bz-Arg-pNA (1.0 mM), Cbz-Phe-Leu-Glu- 
pNA) (0.2 mM) ; S-2238, S-2222, S-2288 and S-2251 each at a 
concentration of 0.05 mM; in 1 . 0 ml of 0 . 2 M Tris-HCl, 5mM CaCl 2 , 
pH 7.5. In some cases either 5 mM cysteine and/or 50 mM glycyl- 
glycine (Gly-Gly) was also added to the reaction mixture. 

For routine assays, pH optimum determination and measurement 
of the effect of stimulating agents and inhibitors on trypsin- 
like, enzymes, only Bz-L-Arg-pNA was used as substrate. Potential 
. inhibitory or stimulatory compounds were preincubated with enzyme 
for up to 20 min at room temperature at pH 7.5, in the presence 
of 5 mM CaCl 2 (except when testing the effects of chelating 
agents) prior to the assay for enzyme activity. 

General proteolytic activity was assayed using the same 
buffer system as described for detecting amidolytic activity, but 
using azocoll or azocasein (1% w/v) as substrate. 



A unit of Arg-gingipain-1 enzymatic activity is based on the 
spectroscopic assay using benzoyl-Arg-p-nitroanilide as substrate 
and recording A absorbance units at 405 nm/min/absorbance unit 
at 280 nm according to the method of Chen et al. (1992) supra . 



5 



electrophoresis . 



In the latter case, 1 mM TLCK was used to 



30 
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Example 4 Enzyme Specificity 

Purified Arg-gingipain-1 (0.8 tig) in 50 mM ammonium 
bicarbonate buffer, pH 7.7, 5 mM CaCl 2 , was preincubated with 2 
mM cysteine for 10 min, followed by the addition of either 
5 oxidized insulin B chain (225 /ug) or melittin (225 jig) at 25°C. 
Samples were removed after various time intervals, and the 
reaction mixtures were subjected to HPLC (reverse phase column, 
MicroPak SP C-18 column) using linear gradients (0.08% 
trif luoroacetic acid to 0.08% trif luoroacetic acid plus 80% 
10 acetonitrile, over a 45 min period (flow rate 1.0 ml/min) . 

Peptides were detected by monitoring A 220 . Product peaks were 
collected and subjected to amino acid analysis and/or amino- 
terminal sequence analysis. 

15 Example 5 Amino Acid Sequence Analysis 

Amino-terminal amino acid sequence analysis of either Arg- 
gingipain-1 or degradation products from proteolytic reactions 
was carried out using an Applied Biosystems 4760A gas-phase 
sequenator, using the program designed by the manufacturer., 

2 0 

The amino acid sequence of the COOH terminus of SDS- 
denatured Arg-gingipain-1 and of Arg-gingipain-2 was determined. 
10 nmol aliquots of gingipain-1 were digested in 0.2 M N- 
ethylmorpholine acetate buffer, pH 8.0, with carboxypeptidase A 
25 and B at room temperature, using 1:100 and 1:50 molar ratios, 
respectively. Samples were removed at intervals spanning 0 to 
12 hours, boiled to inactivate the carboxypeptidase, and protein 
was precipitated with 20% trichloracetic acid. Amino acid 
analysis was performed on the supernatants . 

30 

Example 6 Materials 

MeO-Suc-Ala-Ala-Pro-Val-pNA, Suc-Ala-Ala-Pro-Phe-pNA, Gly- 
Pro-pNA, Suc-Ala-Ala-Ala-pNA, Bz-Arg— pNA , 

diisopropylf luorophosphate, phenylmethylsulf onyl fluoride, tosyl- 
35 L-lysine chloromethyl ketone (TLCK) , tosyl-L-phenylalanine 
chloromethyl ketone (TPCK) , trans-epoxysuccinyl-L-leucylamide- (4- 
guanidino) butane) , an inhibitor of cysteine proteinases, 
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leupeptin, antipain and azocasein were obtained from Sigma 
Chemical Co., St. Louis, MO. 3 , 4-Dichloroisocoumarin was 
obtained from Boehringer, Indianapolis, IN and CBz-Phe-Leu-Glu- 
pNA and azocoll were obtained from Calbiochem, La Jolla, CA. S- 
5 2238 (D-Phe-Pip-Arg-pNA) , S-2222 (Bz-Ile-Glu- (K~OR) -Gly-Arg-pNA) , 

S-2288 (D-Ile-Pro-Arg-pNA) , and S-2251 (D-Val-Leu-Lys-pNA) were 
from Kabi-Vitrum, (Beaumont, Texas) . 

Example 7 Electrophoresis 

10 SDS-PAGE of Arg-gingipain-1 was performed as in Laemmli 

(1970) Nature 227 : 680-685. Prior to electrophoresis the samples 
were boiled in a buffer containing 20% glycerol, 4% SDS, and 0.1% 
bromphenol blue. The samples were run under reducing conditions 
by adding 2% 6-mercaptoethanol unless otherwise noted. Samples 

15 were heated for 5 min at 100°C prior to loading onto gels* A 5- 
15% gradient gel was used for the initial digests of C3 and C5, 
and the gels were subsequently stained with Coomassie Brilliant 
Blue R. The C5 digest used to visualize breakdown products 
before and after reduction of the disulfide bonds were 

20 electrophoresed in a 8% gel. Attempts to visualize C5a in the 
C5 digest were carried out using 13% gels that were developed 
* ■-" with silver stain according to the method of Merril et al. (1979) 
Proc. Natl. Acad. Sci USA 76, 4335-4340. 

25 - In some experiments (high molecular weight forms) SDS-PAGE 

using Tris-HCl/Tricine buffer was carried out per Shagger and Van 
Jagow (1987) Analyt. Biochem. 166, 368-379. 

Electrophoresis on cellulose acetate strips were performed 
30 in 0.075 barbital buffer at pH 8.5 and 4°C for 30 min. at 200 V. 

The Beckman Microzone apparatus (model R101) used for the 
electrophoresis of the protein, and the strips were stained using 
Amido Black. 

35 Example 8 Oligonucleotide Synthesis 

Oligonucleotide primers for PCR probes and sequencing were 
synthesized by the phosphoraminite method with an Applied 
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Biosystems model 3 94 automated DNA synthesizer (Applied 
Biosystems, Foster City, CA) and purified by PAGE and desalted 
on Sep-Pak (Millipore Corp. # Beverly, MA) using standard 
protocols. Primer GIN-1-32 was designed to bind to the noncoding 
strand of Arg-gingipain DNA corresponding to the NH 2 -terminal 
portion of the mature protein, i*e„, to the sequence encoding 
amino acids 2-8 within SEQ ID NO:l. The sequence of the 32-base 
primer consists of 20 bases specific for Arg-gingipain and six 
additional bases at the 5' end (underlined), as follows: 5'- 
GGCTTTACNCCNGTNGARGARYTNGA-3 ' (SEQ ID NO: 6), Where N is A or G 
or C or T. Primer GIN-2-3 0 was designed to bind to the coding 
strand of Arg-gingipain DNA corresponding to the amino acids 25- 
32 of the mature protein, i.e., residues 25-32 of SEQ ID NQ.:1. 
The sequence of the 3 0-base primer consists of 2 4 bases specific 
for gingipain-1 (and gingipain-2) DNA and six additional bases 
at the 5' end (underlined), as follows: 5 9 - GGCTTT RTTYTTCCARTC 
NACRAARTCYTT-3 9 , where R is A or G, Y is C or T and N is A or G 
or C or T (SEQ ID NO:7). Primer GIN-8S-48: 5 ' - CCTGGAGAATTC TCG 
TATGATCGTCATCGTAGCCAAAAAGTATGAGGG-3 9 (SEQ ID NO: 8) was designed 
to bind to the noncoding strand of Arg-gingipain DNA 
corresponding to the amino acids 11-22 of the mature protein, 
i.e., amino acids 11-22 of SEQ ID NO:l, and was designed on the 
basis of partial DNA sequence information for the Arg-gingipain 
coding sequence (nucleotides 1659-1694 of SEQ ID NO: 3) and 
included a 6-base EcoRl restriction site plus six additional 
bases at the 5' end (underlined). This primer was used as a 
probe to screen a XDASH L. ainaivalis genomic DNA library (see 
below) . One additional oligonucleotide GIN-14-20 (20-mers) , 
initially designed to sequence Arg-gingipain DNA, was used as a 
probe to identify and then clone the 3' end of the gingipain-1 
coding sequence, as a Pst l- Hin dlll sequence. Primer GIN-14-20 
was designed to bind to the noncoding strand of gingipain-1 DNA 
corresponding to 2 0 bases specific for 3 9 end of Arg-gingipain 
(nucleotides 2911-2930 within SEQ ID N0:3): 5'- 
ATCAACACTAATGGTGAGCC-3 ' (SEQ ID NO: 9). A total of 71 20-mers 
internal primers were designed using empirically determined 
sequence to sequence the Arg-gingipain locus. 
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Example 9 Polymerase Chain Reaction 

The DNA templates used in PCR was ainaivalis strain H66 
total cellular DNA. The PCR was run using primer GIN-1-32 (SEQ 
ID NO:6) along with primer GIN-2-30 (SEQ ID NO:7); PCR 
5 consistently yielded a single 105-base pair product (P105) 
detected on a 7% acrylamide gel representing a partial gingipain 
DNA. After treatment with the Klenow enzyme, P105 was cloned in 
pCR-Script™SK(+) (Stratagene La Jolla, CA) . After sequence 
analysis of P105, specific primer GIN-8S-48 (SEQ ID NO: 8) was 
10 designed to use as a probe. The 32 P-labeled GIN-8S-4 8 probe, was 
generated by kinase reaction for use in subsequent hybridization 
screening of the XDASH library. Incorporated nucleotides were 
separated from unincorporated nucleotides on a Sephadex G-25 
column (Boehringer Mannheim Corporation, Indianapolis, IN) . 

15 

Example 10 Construction and Screening of the genomic DNA library 
XDASH and XZAP DNA libraries were constructed according to 
the protocols of Stratagene, using the lambda DASH™ II/BamHI 
cloning kit and DNA preparations from P^. gingivalis strains H66 
20 and W50. Libraries of 3x10* independent recombinant clones was 
obtained using P^. gingivalis H66 DNA, and 1.5xl0 5 independent 
recombinant clones were obtained from P^ gingivalis W50 DNA\T 

Approximately 3xl0 5 phages were grown on 5x150 mm agar 
25 plates, lifted in duplicate onto supported nitrocellulose 
transfer membrane (BAS-NC, Schleicher & Schuell, Keene, NH) , 
hybridized to the 32 P-labeled GIN-8S-48 probe described above. 
Hybridizations were performed overnight at 42°C in 2X Denhardt's 
solution (Denhardt, D.T. (1966), Biochem. Blophys . Res. Comm. 23., 
30 641-646), 6X SSC (SSC is 15 mM sodium citrate, 150 mM NaCl) , 0.4% 
SDS (w/v) , 500 M9/nil fish sperm DNA. The filters were washed in 
2X SSC containing 0.05% SDS (w/v) at 48°C. Seven positively 
hybridizing plaques were purified. After extraction and 
purification, the DNA was analyzed by restriction enzyme 
35 digestion and agarose gel electrophoresis. The 3 kb-Pstl 
fragment from clone Al (P^ gingivalis H66) was subsequently 
cloned into pBluescript SK(-) (Stratagene, La Jolla, CA) and 
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M13ropl8 and 19 and sequenced* After restriction analysis of the 
Al clone, a Smal/BamHI fragment was then cloned into pBluescript 
SK(-) . A Pstl/BaroHI smaller fragment was subcloned into M13mpl8 
and 19 for sequencing purposes. 3.5 and 0.5 kb-BamHI fragments 
5 from the XZAP aingivalis W50 DNA library were cloned into 
pBluescript SK(-) and M13mpl8 and 19 and sequenced. Standard 
protocols for cDNA library screening, lambda phage purification, 
agarose gel electrophoresis and plasmid cloning were employed 
(Maniatis et al. (1982), supra ) . Standard protocols for cDNA 
10 library screening, lambda phage purification, agarose gel 
electrophoresis and plasmid cloning were employed (Maniatis et 
al., 1982 supra ) . 

Example 11 Southern Blot Analysis 

15 The membranes were washed as described above. BamHI, 

Hindlll- or Pstl-digested aingivalis H66 DNA samples were 

hybridized with 32 P-labeled GIN-8S-4 8. Two BamHI fragments of 
approximately 9.4 and 3.5 kb, and two PstI fragments of 
approximately 9.4 and 3 kb were found. No HindJ.ll fragment vwas 

20 seen. BamHI- and Pstl-digested \DASH DNA after screening and 
purification of positive recombinant clones from the library 
revealed one clone (Al) with a 3.5 kb BamHI fragment and a 3 kb 
PstI fragment; one clone (Bl) with a 9.4 kb BamHI fragment and 
a 9.4 kb PstI fragment; and 5 clones with a 9.4 kb BamHI fragment 

25 and a 10 kb PstI fragment. The Al clone was sequenced because 
the DNA predicted to encode a 50-kDa protein is approximately 
1.35 kb. In order to clone the stop codon of Arg-gingipain-2 , 
double Pstl/Hirtdlll-digested P^_ aingivalis DNA were hybridized 
with 32 P-labeled GIN-14-20. One Pstl/Hindlll fragment of 

30 approximately 4.3 kb was found. This fragment was gel purified 
and cloned into pBluescript SK(-) for sequencing. Smaller 
fragments (Pstl/Smal and BajnHI/Hindlll) were also subcloned into 
M13mpl8 and 19 and sequenced, and was found to include the stop 
codon. Table 2 hereinabove (see also SEQ ID NO: 10) which 

3 5 presents about 7 kb of sequence extending from a PstI site 
upstream of the start codon through a Hindi I I site downstream of 
the end of the prepolyprotein' s stop codon. 
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Example 12 DNA Sequencing 

Double-stranded DNA cloned into pBluescript SK(-) and 
single-stranded DNA cloned into M13mpl8 and 19 were sequenced by 
the dideoxy terminator method [Sanger et-al. (1977) Proc. Natl. 
5 Acad. Sci. USA 1±, 5463-5467] using sequencing kits purchased 
from United States Biochemicals (Cleveland, OH; Sequenase version 
2.0). The DNA was sequenced using M13 universal primer, reverse 
sequencing primer and internal primers as well understood in the 
art. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: UNIVERSITY OF GEORGIA, RESEARCH FOUNDATION INC. 

(ii) TITLE OF INVENTION: Porphyromonas Gingivalis 

Arginine-Specif ic Proteinase Coding Sequences 

<iii) NUMBER OF SEQUENCES: 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Greenlee and Winner, P.C. 

(B) STREET: 5370 Manhattan Circle, Suite 201 

(C) CITY: Boulder 

(D) STATE: CO 

(E) COUNTRY: USA 

(F) ZIP: 80303 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

<C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: Unassigned 

(B) FILING DATE: 09-SEP-1994 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/119,361 

(B) FILING DATE: 10-SEP-1993 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/265,441 

(B) FILING DATE: 24-JUN-1994 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/141,324 

(B) FILING DATE: 21-OCT-1993 

(viii) ATTORNEY /AGENT INFORMATION: 
(A) NAME: Ferber, Donna M. 
<B) REGISTRATION NUMBER: 33,878 

(C) REFERENCE /DOCKET NUMBER: 21-93B PCT 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 303-499-8080 

(B) TELEFAX: 303-499-8089 

(C) TELEX: 49617824 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 
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<v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Porphyromonas gingivalis 

(B) STRAIN: H66 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Tyr Thr Pro Val Glu Glu Lys Gin Asn Gly Arg Met He Val He Val 
15 10 15 

Ala Lys Lys Tyr Glu Gly Asp He Lys Asp Phe Val Asp Trp Lys Asn 
20 25 30 

Gin Arg Gly Leu Thr Lys Xaa Val Lys Xaa Ala 
35 40 

(2 ) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Gly Tyr Gly Asp Ser Asn Tyr Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3159 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 919.. 3159 

(ix) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 1630.. 3105 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
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CTGCAGAGGG CTGGTAAAGA CCGCCTCGGG ATCGAGGCCT TTGAGACGGG CACAAGCCGC 60 

CGCAGCCTCC TCTTCGAAGG TGTCTCGAAC GTCCACATCG GTGAATCCGT AGCAGTGCTC 120 

ATTGCCATTG AGCAGCACCG AGGTGTGGCG CATCAGATAT ATTTTCATCA GTGGATTATT 180 

AGGGTATCGG TCAGAAAAAG CCTTCCGAAT CCGACAAAGA TAGTAGAAAG AGAGTGCATC 240 

TGAAAACAGA TCATTCGAGG ATTATCGATC AACTGAAAAG GCAGGAGTTG TTTTGCGTTT 300 

TGGTTCGGAA AATTACCTGA TCAGCATTCG TAAAAACGTG GCGCGAGAAT TTTTTCGTTT 360 

TGGCGCGAGA ATTAAAAATT TTTGGAACCA CAGCGAAAAA AATCTCGCGC CGTTTTCTCA 420 

GGATTTACAG ACCACAATCC GAG C ATTTT C GGTTCGTAAT TCATCGAAGA GACAGGTTTT 480 

ACCGCATTGA AATCAGAGAG AGAATATCCG TAGTCCAACG GTTCATCCTT ATATCAGAGG 540 

TTAAAAGATA TGGTACGCTC ATCGAGGAGC TGATTGGCTT AGTAGGTGAG ACTTTCTTAA 600 

GAGACTATCG GCACCTACAG GAAGTTCATG GCACACAAGG CAAAGGAGGC AATCTTCGCA 660 

GACCGGACTC ATATCAAAAG GATGAAACGA CTTTTCCATA CGACAACCAA ATAGCCGTCT 720 

ACGGTAGACG AATGCAAACC CAATATGAGG CCATCAATCA ATCCGAATGA CAGCTTTTGG 780 

GCAATATATT ATGCATATTT TGATTCGCGT TTAAAGGAAA AG TG CAT AT A TTTGCGATTG 840 

TGGTATTTCT TTCGGTTTCT ATGTGAATTT TGTCTCCCAA GAAGACTTTA TAATGCATAA 900 

ATACAGAAGG GGTACTACAC AGTAAAATCA TATTCTAATT TCATCAAA ATG AAA AAC 957 

Met Lys Asn 
-227 -225 

TTG AAC AAG TTT GTT TCG ATT GCT CTT TGC TCT TCC TTA TTA GGA GGA 1005 
Leu Asn Lys Phe Val Ser lie Ala Leu Cys Ser Ser Leu Leu Gly Gly 
-220 -215 -210 

ATG GCA TTT GCG CAG CAG ACA GAG TTG GGA CGC AAT CCG AAT GTC AGA 1053 
Met Ala Phe Ala Gin Gin Thr Glu Leu Gly Arg Asn Pro Asn Val Arg 
-205 -200 -195 

TTG CTC GAA TCC ACT CAG CAA TCG GTG ACA AAG GTT CAG TTC CGT ATG 1101 
Leu Leu Glu Ser Thr Gin Gin Ser Val Thr Lys Val Gin Phe Arg Met 
-190 -185 -180 

GAC AAC CTC AAG TTC ACC GAA GTT CAA ACC CCT AAG GGA ATC GGA CAA 1149 
Asp Asn Leu Lys Phe Thr Glu Val Gin Thr Pro Lys Gly lie Gly Gin 
-175 -170 -165 

GTG CCG ACC TAT ACA GAA GGG GTT AAT CTT TCC GAA AAA GGG ATG CCT 1197 
Val Pro Thr Tyr Thr Glu Gly Val Asn Leu Ser Glu Lys Gly Met Pro 
-160 -155 -150 -145 

ACG CTT CCC ATT CTA TCA CGC TCT TTG GCG GTT TCA GAC ACT CGT GAG 1245 
Thr Leu Pro lie Leu Ser Arg Ser Leu Ala Val Ser Asp Thr Arg Glu 
-140 -135 -130 

ATG AAG GTA GAG GTT GTT TCC TCA AAG TTC ATC GAA AAG AAA AAT GTC 1293 
Met Lys Val Glu Val Val Ser Ser Lys Phe lie Glu Lys Lys Asn Val 
-125 -120 -115 

CTG ATT GCA CCC TCC AAG GGC ATG ATT ATG CGT AAC GAA GAT CCG AAA 1341 
Leu lie Ala Pro Ser Lys Gly Met lie Met Arg Asn Glu Asp Pro Lys 
-110 -105 -100 
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AAG ATC CCT TAC GTT TAT GGA AAG AGC TAC TCG CAA AAC AAA TTC TTC 1389 
Lys He Pro Tyr Val Tyr Gly Lys Ser Tyr Ser Gin Asn Lys Phe Phe 
-95 -90 -85 

CCG GGA GAG ATC GCC ACG CTT GAT GAT CCT TTT ATC CTT CGT GAT GTG 1437 
Pro Gly Glu He Ala Thr Leu Asp Asp Pro Phe He Leu Arg Asp Val 
-80. -75 -70 " -65 

CGT GGA CAG GTT GTA AAC TTT GCG CCT TTG CAG TAT AAC CCT GTG ACA 1485 
Arg Gly Gin Val Val Asn Phe Ala Pro Leu Gin Tyr Asn Pro Val Thr 
-60 -55 -50 

AAG ACG TTG CGC ATC TAT ACG GAA ATC ACT GTG GCA GTG AGC GAA ACT 1533 
Lys Thr Leu Arg He Tyr Thr Glu He Thr Val Ala Val Ser Glu Thr 
-45 -40 -35 

TCG GAA CAA GGC AAA AAT ATT CTG AAC AAG AAA GGT ACA TTT GCC GGC 1581 
Ser Glu Gin Gly Lys Asn He Leu Asn Lys Lys Gly Thr Phe Ala Gly 
-30 -25 -20 

TTT GAA GAC ACA TAC AAG CGC ATG TTC ATG AAC TAC GAG CCG GGG CGT 1629 
Phe Glu Asp Thr Tyr Lys Arg Met Phe Met Asn Tyr Glu Pro Gly Arg 
-15 -10 -5 

TAC ACA CCG GTA GAG GAA AAA CAA AAT GGT CGT ATG ATC GTC ATC GTA . 1677 
Tyr Thr Pro Val Glu Glu Lys Gin Asn Gly Arg Met He Val He Val 
15 10 15 

GCC AAA AAG TAT GAG GGA GAT ATT AAA GAT TTC GTT GAT TGG AAA AAC 1725 
Ala Lys Lys Tyr Glu Gly Asp He Lys Asp Phe Val Asp Trp Lys Asn 
20 25 30 

CAA CGC GGT CTC CGT ACC GAG GTG AAA GTG GCA GAA GAT ATT GCT TCT 1773 
Gin Arg Gly Leu Arg Thr Glu Val Lys Val Ala Glu Asp He Ala Ser 
35 40 45 

CCC GTT ACA GCT AAT GCT ATT CAG CAG TTC GTT AAG CAA GAA TAC GAG 1821 
Pro Val Thr. Ala Asn Ala He Gin Gin Phe Val Lys' Gin Glu Tyr Glu 
50 55 60 

AAA GAA GGT AAT GAT TTG ACC TAT GTT CTT TTG GTT GGC GAT CAC AAA 1869 
Lys Glu Gly Asn Asp Leu Thr Tyr Val Leu Leu Val Gly Asp His Lys 
65 70 75 80 

GAT ATT CCT GCC AAA ATT ACT CCG GGG ATC AAA TCC GAC CAG GTA TAT 1917 
Asp He Pro Ala Lys He Thr Pro Gly He Lys Ser Asp Gin Val Tyr 
85 90 95 

GGA CAA ATA GTA GGT AAT GAC CAC TAC AAC GAA GTC TTC ATC GGT CGT 1965 
Gly Gin He Val Gly Asn Asp His Tyr Asn Glu Val Phe He Gly Arg 
100 105 110 

.TTC TCA TGT GAG AGC AAA GAG GAT CTG AAG ACA CAA ATC GAT CGG ACT 2013 
Phe Ser Cys Glu Ser Lys Glu Asp Leu Lys Thr Gin He Asp Arg Thr 
115 120 125 

ATT CAC TAT GAG CGC AAT ATA ACC ACG GAA GAC AAA TGG CTC GGT CAG 2061 
He His Tyr Glu Arg Asn He Thr Thr Glu Asp Lys Trp Leu Gly Gin 
130 135 140 

GCT CTT TGT ATT GCT TCG GCT GAA GGA GGC CCA TCC GCA GAC AAT GGT 2109 
Ala Leu Cys He Ala Ser Ala Glu Gly Gly Pro Ser Ala Asp Asn Gly 
145 150 155 ~ 160 

GAA AGT GAT ATC CAG CAT GAG AAT GTA ATC GCC AAT CTG CTT ACC CAG 2157 
Glu Ser Asp He Gin His Glu Asn Val lie Ala Asn Leu Leu Thr Gin 
165 170 175 
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TAT GGC TAT ACC AAG ATT ATC AAA TGT TAT GAT CCG GGA GTA ACT CCT 2205 

Tyr Gly Tyr Thr Lys He He Lys Cys Tyr Asp Pro Gly Val Thr Pro 
180 185 190 

AAA AAC ATT ATT GAT GCT TTC AAC GGA GGA ATC TCG TTG GTC AAC TAT 2253 
Lys Asn He He Asp Ala Phe Asn Gly Gly He Ser Leu Val Asn Tyr 
195 200 205 

ACG GGC CAC GGT AGC GAA ACA GCT TGG GGT ACG TCT CAC TTC GGC ACC 2301 
Thr Gly His Gly Ser Glu Thr Ala Trp Gly Thr Ser His Phe Gly Thr 
210 215 220 

ACT CAT GTG AAG CAG CTT ACC AAC AGC AAC CAG CTA CCG TTT ATT TTC 2349 
Thr His Val Lys Gin Leu Thr Asn Ser Asn Gin Leu Pro Phe He Phe 
225 230 235 240 

GAC GTA GCT TGT GTG AAT GGC GAT TTC CTA TTC AGC ATG CCT TGC TTC 2397 
Asp Val Ala Cys Val Asn Gly Asp Phe Leu Phe Ser Met Pro Cys Phe 
245 250 255 

GCA GAA GCC CTG ATG CGT GCA CAA AAA GAT GGT AAG CCG ACA GGT ACT 2445 
Ala Glu Ala Leu Met Arg Ala Gin Lys Asp Gly Lys Pro Thr Gly Thr 
260 265 " 270 

GTT GCT ATC ATA GCG TCT ACG ATC AAC CAG TCT TGG GCT TCT CCT ATG 2493 
Val Ala He He Ala Ser Thr He Asn Gin Ser Trp Ala Ser Pro Met 
275 280 285 

CGC GGG CAG GAT GAG ATG AAC GAA ATT CTG TGC GAA AAA CAC CCG AAC 2541 
Arg Gly Gin Asp Glu Met Asn Glu He Leu Cys Glu Lys His Pro Asn 
290 295 . 300 

AAC ATC AAG CGT ACT TTC GGT GGT GTC ACC ATG AAC GGT ATG TTT GCT 2589 
Asn He Lys Arg Thr Phe Gly Gly Val Thr Met Asn Gly Met Phe Ala . 
305 310 315 320 

ATG GTG GAA AAG TAT AAA AAG GAT GGT GAG AAG ATG CTC GAC ACA TGG 2637 
Met Val Glu Lys Tyr Lys Lys Asp Gly Glu Lys Met Leu Asp Thr Trp 
325 330 335 

ACT GTT TTC GGC GAC CCC TCG CTG CTC GTT CGT ACA CTT GTC CCG ACC 2685, 
Thr Val Phe Gly Asp Pro Ser Leu Leu Val Arg Thr Leu Val Pro Thr 
340 345 350 

AAA ATG CAG GTT ACG GCT CCG GCT CAG ATT AAT TTG ACG GAT GCT TCA 2733 
Lys Met Gin Val Thr Ala Pro Ala Gin He Asn Leu Thr Asp Ala Ser 
355 360 365 

GTC AAC GTA TCT TGC GAT TAT AAT GGT GCT ATT GCT ACC ATT TCA GCC 2781 
Val Asn Val Ser Cys Asp Tyr Asn Gly Ala He Ala Thr He Ser Ala 
370 375 380 

AAT GGA AAG ATG TTC GGT TCT GCA GTT GTC GAA AAT GGA ACA GCT ACA 2829 
Asn Gly Lys Met Phe Gly Ser Ala Val Val Glu Asn Gly Thr Ala Thr 
385 390 395 400 

ATC AAT CTG ACA GGT CTG ACA AAT GAA AGC ACG CTT ACC CTT ACA GTA 2877 
He Asn Leu Thr Gly Leu Thr Asn Glu Ser Thr Leu Thr Leu Thr Val 
405 410 415 

GTT GGT TAC AAC AAA GAG ACG GTT ATT AAG ACC ATC AAC ACT AAT GGT 292 5 

Val Gly Tyr Asn Lys Glu Thr Val He Lys Thr He Asn Thr Asn Gly 
420 425 430 

GAG CCT AAC CCC TAC CAG CCC GTT TCC AAC TTG ACA GCT ACA ACG CAG 2973 
Glu Pro Asn Pro Tyr Gin Pro Val Ser Asn Leu Thr Ala Thr Thr Gin 
435 440 445 
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GGT CAG AAA GTA ACG CTC AAG TGG GAT GCA CCG AGC ACG AAA ACC AAT 3021 

Gly Gin Lys Val Thr Leu Lys Trp Asp Ala Pro Ser Thr Lys Thr Asn 
450 455 460 

GCA ACC ACT AAT ACC GCT CGC AGC GTG GAT GGC ATA CGA GAA TTG GTT 3069 
Ala Thr Thr Asn Thr Ala Arg Ser Val Asp Gly lie Arg Glu Leu Val 
465 470 475 480 

CTT CTG TCA GTC AGC GAT GCC CCC GAA CTT CTT CGC AGC GGT CAG GCC 3117 
Leu Leu Ser Val Ser Asp Ala Pro Glu Leu Leu Arg Ser Gly Gin Ala 
485 490 495 

GAG ATT GTT CTT GAA GCT CAC GAT GTT TGG AAT GAT GGA TCC 3159 
Glu lie Val Leu Glu Ala His Asp Val Trp Asn Asp Gly Ser 
500 505 ~ 510 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 737 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Lys Asn Leu Asn Lys Phe Val Ser lie Ala Leu Cys Ser Ser Leu 
-227 -225 -220 -215 

Leu Gly Gly Met Ala Phe Ala Gin Gin Thr Glu Leu Gly Arg Asn Pro 
-210 -205 -200 

Asn Val Arg Leu Leu Glu Ser Thr Gin Gin Ser Val Thr Lys Val Gin 
~195 -190 -185 -180 

Phe Arg Met Asp Asn Leu Lys Phe Thr Glu Val Gin -Thr Pro Lys Gly 
-175 -170 -165 

t lie Gly Gin Val Pro Thr Tyr Thr Glu Gly Val Asn Leu Ser Glu Lys 
-160 -155 -150 

Gly Met Pro Thr Leu Pro lie Leu Ser Arg Ser Leu Ala Val Ser Asp 
-145 -140 ** -135 

Thr Arg Glu Met Lys Val Glu Val Val Ser Ser Lys Phe lie Glu Lys 
-130 -125 -120 

Lys Asn Val Leu lie Ala Pro Ser Lys Gly Met lie Met Arg Asn Glu 
-115 -no -105 -100 

Asp Pro Lys Lys lie Pro Tyr Val Tyr Gly Lys Ser Tyr Ser Gin Asn 
-95 -90 ' -85 

Lys Phe Phe Pro Gly Glu lie Ala Thr Leu Asp Asp Pro Phe lie Leu 
-80 -75 -70 

Arg Asp Val Arg Gly Gin Val Val Asn Phe Ala Pro Leu Gin Tyr Asn 
-65 -60 -55 

Pro Val Thr Lys Thr Leu Arg lie Tyr Thr Glu lie Thr Val Ala Val 
-50 -45 -40 

Ser Glu Thr Ser Glu Gin Gly Lys Asn lie Leu Asn Lys Lys Gly Thr 
-35 -30 -25 -20 
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Phe Ala Gly Phe Glu Asp Thr Tyr Lys Arg Met Phe Met Asn Tyr Glu 
-15 -10 -5 

Pro Gly Arg Tyr Thr Pro Val Glu Glu Lys Gin Asn Gly Arg Met lie 
1 5 10 

Val lie Val Ala Lys Lys Tyr Glu Gly Asp lie Lys Asp Phe Val Asp 
15 20 25 

Trp Lys Asn Gin Arg Gly Leu Arg Thr Glu Val Lys Val Ala Glu Asp 
30 35 40 45 

lie Ala Ser Pro Val Thr Ala Asn Ala lie Gin Gin Phe Val Lys Gin 
50 55 60 

Glu Tyr Glu Lys Glu Gly Asn Asp Leu Thr Tyr Val Leu Leu Val Gly 
65 70 75 

Asp His Lys Asp He Pro Ala Lys He Thr Pro Gly He Lys Ser Asp 
80 85 90 

Gin Val Tyr Gly Gin He Val Gly Asn Asp His Tyr Asn Glu Val Phe 
95 100 105 

He Gly Arg Phe Ser Cys Glu Ser Lys Glu Asp Leu Lys Thr Gin lie 
HO 115 120 125 

Asp Arg Thr He His Tyr Glu Arg Asn He Thr Thr Glu Asp Lys Trp 
130 135 140 

Leu Gly Gin Ala Leu Cys He Ala Ser Ala Glu Gly Gly Pro Ser Ala 
145 150 155 

Asp Asn Gly Glu Ser Asp He Gin His Glu Asn Val He Ala Asn Leu 
160 165 170 

Leu Thr Gin Tyr Gly Tyr Thr Lys He He Lys Cys Tyr Asp Pro Gly 
175 180 185 

Val Thr Pro Lys Asn He He Asp Ala Phe Asn Gly Gly He Ser Leu 
190 195 200 205 

Val Asn Tyr Thr Gly His Gly Ser Glu Thr Ala Trp Gly Thr Ser His 
210 215 " 220 

Phe Gly Thr Thr His Val Lys Gin Leu Thr Asn Ser Asn Gin Leu Pro 
225 230 235 

Phe He Phe Asp Val Ala Cys Val Asn Gly Asp Phe Leu Phe Ser Met 
240 245 250 

Pro Cys Phe Ala Glu Ala Leu Met Arg Ala Gin Lys Asp Gly Lys Pro 
255 260 265 

Thr Gly Thr Val Ala He He Ala Ser Thr He Asn Gin Ser Trp Ala 
270 275 280 285 

Ser Pro Met Arg Gly Gin Asp Glu Met Asn Glu He Leu Cys Glu Lys 
290 295 & 300 

His Pro Asn Asn He Lys Arg Thr Phe Gly Gly Val Thr Met Asn Gly 
305 310 315 

Met Phe Ala Met Val Glu Lys Tyr Lys Lys Asp Gly Glu Lys Met Leu 
320 325 330 
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Asp Thr Trp Thr Val Phe Gly Asp Pro Ser Leu Leu Val Arg Thr Leu 
335 ~ 340 345 

Val Pro Thr Lys Met Gin Val Thr Ala Pro Ala Gin lie Asn Leu Thr 
350 355 360 365 

Asp Ala Ser Val Asn Val Ser Cys Asp Tyr Asn Gly Ala He Ala Thr 
370 375 380 

He Ser Ala Asn Gly Lys Met Phe Gly Ser Ala Val Val Glu Asn Gly 
385 390 395 

Thr Ala Thr He Asn Leu Thr Gly Leu Thr Asn Glu Ser Thr Leu Thr 
400 405 410 

Leu Thr Val Val Gly Tyr Asn Lys Glu Thr Val lie Lys Thr He Asn 
415 420 425 

Thr Asn Gly Glu Pro Asn Pro Tyr Gin Pro Val Ser Asn Leu Thr Ala 
430 435 440 445 

Thr Thr Gin Gly Gin Lys Val Thr Leu Lys Trp Asp Ala Pro Ser Thr 
450 455 460 

Lys Thr Asn Ala Thr Thr Asn Thr Ala Arg Ser Val Asp Gly He Arg 
465 470 475 

Glu Leu Val Leu Leu Ser Val Ser Asp Ala Pro Glu Leu Leu Arg Ser 
480 485 490 

Gly Gin Ala Glu He Val Leu Glu Ala His Asp Val Trp Asn Asp Gly 
495 500 505 

Ser 
510 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
<iv) ANTI-SENSE: NO 
(v) FRAGMENT TYPE: C-terminal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Glu Leu Leu Arg 
1 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (other nucleic acid) 



(iii) HYPOTHETICAL: NO 



■ * 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 



GGCTTTACNC CNGTNGARGA RYTNGA 



26 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (other nucleic acid) 
(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GGCTTTRTTY TTCCARTCNA CRAARTCYTT 30 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (other nucleic acid) 
(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CCTGGAGAAT TCTCGTATGA TCGTCATCGT AG CC AAAAAG TATGAGGG 48 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (other nucleic acid) 
(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
ATCAACACTA ATGGTGAGCC 20 
(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 7266 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

<ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 949.. 6063 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTGCAGAGGG CTGGTAAAGA CCGCCTCGGG ATCGAGGCCT TTGAGACGGG CACAAGCCGC 60 

CGCAGCCTCC TCTTCGAAGG TGTCTCGAAC GTCCACATCG GTGAATCCGT AGCAGTGCTC 120 

ATTGCCATTG AGCAGCACCG AGGTGTGGCG CATCAGATAT ATTTTCATCA GTGGATTATT 180 

AGGGTATCGG TCAGAAAAAG CCTTCCGAAT CCGACAAAGA TAGTAGAAAG AGAGTGCATC 240 

TGAAAACAGA TCATTCGAGG ATTATCGATC AACTGAAAAG GCAGGAGTTG TTTTGCGTTT 300 

TGGTTCGGAA AATTACCTGA TCAGCATTCG TAAAAACGTG GCG CG AG AAT TTTTTCGTTT 360 

TGGCGCGAGA ATTAAAAATT TTTGGAACCA CAGCGAAAAA AATCTCGCGC CG TTTTCTC A 420 

GGATTTACAG ACCACAATCC GAGCATTTTC GGTTCGTAAT TCATCGAAGA GACAGGTTTT 480 

ACCGCATTGA AATCAGAGAG AGAATATCCG TAGTCCAACG GTTCATCCTT ATATCAGAGG 540 
TT AAAAG ATA TGGTACGCTC ATCG AGGAGC TGATTGGCTT AGTAGGTGAG ACTTTCTTAA ~ * 600 

GAGACTATCG GCACCTACAG GAAGTTCATG GCACACAAGG CAAAGGAGGC AATCTTCGCA 660 

GACCGGACTC AT AT C AAAAG GATGAAACGA CTTTTCCATA CGACAACCAA ATAGCCGTCT 720 

ACGGTAGACG AATGCAAACC CAATATGAGG CCATCAATCA ATCCGAATGA CAGCTTTTGG 780 

GCAATATATT ATGCATATTT TGATTCGCGT TTAAAGGAAA AGTG CAT AT A TTTGCGATTG 840 

TGGTATTTCT TTCGGTTTCT ATGTGAATTT TGTCTCCCAA GAAGACTTTA TAATG CAT AA 900 

ATACAGAAGG GGTACTACAC AGTAAAATCA TATTCTAATT TCATCAAA ATG AAA AAC 957 



Met Lys Asn 
1 



TTG AAC AAG TTT GTT TCG ATT GCT CTT TGC TCT TCC TTA TTA GGA GGA 
Leu Asn Lys Phe Val Ser lie Ala Leu Cys Ser Ser Leu Leu Gly Gly 
5 10 15 



1005 



ATG GCA TTT GCG CAG CAG ACA GAG TTG GGA CGC AAT CCG AAT GTC AGA 
Met Ala Phe Ala Gin Gin Thr Glu Leu Gly Arg Asn Pro Asn Val Arg 
20 25 30 35 



1053 



TTG CTC GAA TCC ACT CAG CAA TCG GTG ACA AAG GTT CAG TTC CGT ATG 
Leu Leu Glu Ser Thr Gin Gin Ser Val Thr Lys Val Gin Phe Arg Met 
40 45 50 



1101 
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GAC AAC CTC AAG TTC ACC GAA GTT CAA ACC CCT AAG GGA ATC GGA CAA 1149 

Asp Asn Leu Lys Phe Thr Glu Val Gin Thr Pro Lys Gly He Gly Gin 

55 60 ' 65 

GTG CCG ACC TAT ACA GAA GGG GTT AAT CTT TCC GAA AAA GGG ATG CCT 1197 

Val Pro Thr Tyr Thr Glu Gly Val Asn Leu Ser Glu Lys Gly Met Pro 
70 75 80 

ACG CTT CCC ATT CTA TCA CGC TCT TTG GCG GTT TCA GAC ACT CGT GAG 1245 

Thr Leu Pro He Leu Ser Arg Ser Leu Ala Val Ser Asp Thr Arg Glu 

85 90 95 



ATG AAG GTA GAG GTT GTT TCC TCA AAG TTC ATC GAA AAG AAA AAT GTC 1293 
Met Lys Val Glu Val Val Ser Ser Lys Phe He Glu Lys Lys Asn Val 
100 105 HO 115 

CTG ATT GCA CCC TCC AAG GGC ATG ATT ATG CGT AAC GAA GAT CCG AAA 1341 
Leu He Ala Pro Ser Lys Gly Met He Met Arg Asn Glu Asp Pro Lys 
120 125 130 

AAG ATC CCT TAC GTT TAT GGA AAG AGC TAC TCG CAA AAC AAA TTC TTC 1389 
Lys He Pro Tyr Val Tyr Gly Lys Ser Tyr Ser Gin Asn Lys Phe Phe 
135 140 145 

CCG GGA GAG ATC GCC ACG CTT GAT GAT CCT TTT ATC CTT CGT GAT GTG 1437 
Pro Gly Glu He Ala Thr Leu Asp Asp Pro Phe He Leu Arg Asp Val 
150 155 160 

CGT GGA CAG GTT GTA AAC TTT GCG CCT TTG CAG TAT AAC CCT GTG ACA 1485 
Arg Gly Gin Val Val Asn Phe Ala Pro Leu Gin Tyr Asn Pro Val Thr 
165 170 175 

AAG ACG TTG CGC ATC TAT ACG GAA ATC ACT GTG GCA GTG AGC GAA ACT 1533 
Lys Thr Leu Arg lie Tyr Thr Glu He Thr Val Ala Val Ser Glu Thr 
180 185 190 195 

TCG GAA CAA GGC AAA AAT ATT CTG AAC AAG AAA GGT ACA TTT GCC GGC 1581 
Ser Glu Gin Gly Lys Asn lie Leu Asn Lys Lys Gly Thr Phe Ala Gly 
200 205 210 

TTT GAA GAC ACA TAC AAG CGC ATG TTC ATG AAC TAC GAG CCG GGG CGT 1629 
Phe Glu Asp Thr Tyr Lys Arg Met Phe Met Asn Tyr Glu Pro Gly Arg 
215 220 225 

TAC ACA CCG GTA GAG GAA AAA CAA AAT GGT CGT ATG ATC GTC ATC GTA 1677 
Tyr Thr Pro Val Glu Glu Lys Gin Asn Gly Arg Met He Val He Val 
230 235 " 240 

GCC AAA AAG TAT GAG GGA GAT ATT AAA GAT TTC GTT GAT TGG AAA AAC 172 5 

Ala Lys Lys Tyr Glu Gly Asp He Lys Asp Phe Val Asp Trp Lys Asn 
245 250 ~ 255 

CAA CGC GGT CTC CGT ACC GAG GTG AAA GTG GCA GAA GAT ATT GCT TCT 1773 
Gin Arg Gly Leu Arg Thr Glu Val Lys Val Ala Glu Asp He Ala Ser 
260 265 270 * 275 

CCC GTT ACA GCT AAT GCT ATT CAG CAG TTC GTT AAG CAA GAA TAC GAG 1821 
Pro Val Thr Ala Asn Ala He Gin Gin Phe Val Lys Gin Glu Tyr Glu 
280 285 290 

AAA GAA GGT AAT GAT TTG ACC TAT GTT CTT TTG GTT GGC GAT CAC AAA 1869 
Lys Glu Gly Asn Asp Leu Thr Tyr Val Leu Leu Val Gly Asp His Lys 
295 300 305 
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GAT ATT CCT GCC AAA ATT ACT CCG GGG ATC AAA TCC GAC CAG GTA TAT 1917 

Asp lie Pro Ala Lys lie Thr Pro Gly He Lys Ser Asp Gin Val Tyr 
310 315 320 

GGA CAA ATA GTA GGT AAT GAC CAC TAC AAC GAA GTC TTC ATC GGT CGT 1965 

Gly Gin He Val Gly Asn Asp His Tyr Asn Glu Val Phe He Gly Arg 
325 330 335 



TTC TCA TGT GAG AGC AAA GAG GAT CTG AAG ACA CAA ATC GAT CGG ACT 2013 
Phe Ser Cys Glu Ser Lys Glu Asp Leu Lys Thr Gin He Asp Arg Thr 
340 345 350 355 

ATT CAC TAT GAG CGC AAT ATA ACC ACG GAA GAC AAA TGG CTC GGT CAG 2061 
He His Tyr Glu Arg Asn He Thr Thr Glu Asp Lys Trp Leu Gly Gin 
360 365 " 370 

GCT CTT TGT ATT GCT TCG GCT GAA GGA GGC CCA TCC GCA GAC AAT GGT 2109 
Ala Leu Cys He Ala Ser Ala Glu Gly Gly Pro Ser Ala Asp Asn Gly 
375 380 385 

GAA AGT GAT ATC CAG CAT GAG AAT GTA ATC GCC AAT CTG CTT ACC CAG 2157 
Glu Ser Asp He Gin His Glu Asn Val He Ala Asn Leu Leu Thr Gin 
390 395 400 

TAT GGC TAT ACC AAG ATT ATC AAA TGT TAT GAT CCG GGA GTA ACT CCT 2205 
Tyr Gly Tyr Thr Lys He lie Lys Cys Tyr Asp Pro Gly Val Thr Pro 
405 410 415 

AAA AAC ATT ATT GAT GCT TTC AAC GGA GGA ATC TCG TTG GTC AAC TAT 2253 
Lys Asn He He Asp Ala Phe Asn Gly Gly He Ser Leu Val Asn Tyr 
420 425 430 435 

ACG GGC CAC GGT AGC GAA ACA GCT TGG GGT ACG . TCT CAC TTC GGC ACC 2301 
Thr Gly His Gly Ser Glu Thr Ala Trp Gly Thr Ser His Phe Gly Thr 
440 ■■ 445- 450 

ACT CAT GTG AAG CAG CTT ACC AAC AGC AAC CAG CTA CCG TTT ATT TTC* 2349 
Thr His Val Lys Gin Leu Thr Asn Ser Asn Gin Leu Pro Phe He Phe 
455 460 465 

GAC GTA GCT TGT GTG AAT GGC GAT TTC CTA TTC AGC ATG CCT TGC TTC 2397 
Asp Val Ala Cys Val Asn Gly Asp Phe Leu Phe Ser Met Pro Cys Phe 
470 475 480 

GCA GAA GCC CTG ATG CGT GCA CAA AAA GAT GGT AAG CCG ACA GGT ACT 2445 
Ala Glu Ala Leu Met Arg Ala Gin Lys Asp Gly Lys Pro Thr Gly Thr 
485 490 495 

GTT GCT ATC ATA GCG TCT ACG ATC AAC CAG TCT TGG GCT TCT CCT ATG 2493 
Val Ala He He Ala Ser Thr He Asn Gin Ser Trp Ala Ser Pro Met 
500 505 510 515 

CGC GGG CAG GAT GAG ATG AAC GAA ATT CTG TGC GAA AAA CAC CCG AAC 2541 
Arg Gly Gin Asp Glu Met Asn Glu He Leu Cys Glu Lys His Pro Asn 
520 525 530 

AAC ATC AAG CGT ACT TTC GGT GGT GTC ACC ATG AAC GGT ATG TTT GCT 2589 
Asn He Lys Arg Thr Phe Gly Gly Val Thr Met Asn Gly Met Phe Ala 
535 540 545 

ATG GTG GAA AAG TAT AAA AAG GAT GGT GAG AAG ATG CTC GAC ACA TGG 2637 
Met Val Glu Lys Tyr Lys Lys Asp Gly Glu Lys Met Leu Asp Thr Trp 
550 555 560 
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ACT GTT TTC GGC GAC CCC TCG CTG CTC GTT CGT ACA CTT GTC CCG ACC 2685 

Thr Val Phe Gly Asp Pro Ser Leu Leu Val Arg Thr Leu Val Pro Thr 
565 ~ 570 575 

AAA ATG CAG GTT ACG GCT CCG GCT CAG ATT AAT TTG ACG GAT GCT TCA 2733 
Lys Met Gin Val Thr Ala Pro Ala Gin lie Asn Leu Thr Asp Ala Ser 
580 585 590 595 

GTC AAC GTA TCT TGC GAT TAT AAT GGT GCT ATT GCT ACC ATT TCA GCC 2781 
Val Asn Val Ser Cys Asp Tyr Asn Gly Ala lie Ala Thr lie Ser Ala 
600 605 610 

AAT GGA AAG ATG TTC GGT TCT GCA GTT GTC GAA AAT GGA ACA GCT ACA 2829 
Asn Gly Lys Met Phe Gly Ser Ala Val Val Glu Asn Gly Thr Ala Thr 
615 620 625 

ATC AAT CTG ACA GGT CTG ACA AAT GAA AGC ACG CTT ACC CTT ACA GTA 2877 
He Asn Leu Thr Gly Leu Thr Asn Glu Ser Thr Leu Thr Leu Thr Val 
630 635 640 

GTT GGT TAC AAC AAA GAG ACG GTT ATT AAG ACC ATC AAC ACT AAT GGT 2925 
Val Gly Tyr Asn Lys Glu Thr Val He Lys Thr He Asn Thr Asn Gly 
645 650 655 

GAG CCT AAC CCC TAC CAG CCC GTT TCC AAC TTG ACA GCT ACA ACG CAG 2973 
Glu Pro Asn Pro Tyr Gin Pro Val Ser Asn Leu Thr Ala Thr Thr Gin 
660 665 670 675 

GGT CAG AAA GTA ACG CTC AAG TGG GAT GCA CCG AGC ACG AAA ACC AAT 3021 
Gly Gin Lys Val Thr Leu Lys Trp Asp Ala Pro Ser Thr Lys Thr Asn 
680 685 690 

GCA ACC ACT AAT ACC GCT CGC AGC GTG GAT GGC ATA CGA GAA TTG GTT 3069 
Ala Thr Thr Asn Thr Ala Arg Ser Val Asp Gly lie Arg Glu Leu Val 
695 700 705 

CTT CTG TCA GTC AGC GAT GCC CCC GAA CTT CTT CGC AGC GGT CAG GCC 3117 
Leu Leu Ser Val Ser Asp Ala Pro Glu Leu Leu Arg Ser Gly Gin Ala 
710 715 720 

GAG ATT GTT CTT GAA GCT CAC GAT GTT TGG AAT GAT GGA TCC GGT TAT 3165 
Glu He Val Leu Glu Ala His Asp Val Trp Asn Asp Gly Ser Gly Tyr 
725 730 735 

CAG ATT CTT TTG GAT GCA GAC CAT GAT CAA TAT GGA CAG GTT ATA CCC 3213 
Gin He Leu Leu Asp Ala Asp His Asp Gin Tyr Gly Gin Val He Pro 
740 745 750 755 

AGT GAT ACC CAT ACT CTT TGG CCG AAC TGT AGT GTC CCG GCC AAT CTG 3261 
Ser Asp Thr His Thr Leu Trp Pro Asn Cys Ser Val Pro Ala Asn Leu 
760 765 770 

TTC GCT CCG TTC GAA TAT ACT GTT CCG GAA AAT GCA GAT CCT TCT TGT 3309 
Phe Ala Pro Phe Glu Tyr Thr Val Pro Glu Asn Ala Asp Pro Ser Cys 
775 780 785 

TCC CCT ACC AAT ATG ATA ATG GAT GGT ACT GCA TCC GTT AAT ATA CCG 3357 
Ser Pro Thr Asn Met He Met Asp Gly Thr Ala Ser Val Asn He Pro 
790 795 800 

GCC GGA ACT TAT GAC TTT GCA ATT GCT GCT CCT CAA GCA AAT GCA AAG 3405 
Ala Gly Thr Tyr Asp Phe Ala He Ala Ala Pro Gin Ala Asn Ala Lys 
805 810 815 

ATT TGG ATT GCC GGA CAA GGA CCG ACG AAA GAA GAT GAT TAT GTA TTT 3453 
He Trp He Ala Gly Gin Gly Pro Thr Lys Glu Asp Asp Tyr Val Phe 
820 825 830 835 
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GAA GCC GGT AAA AAA TAC CAT TTC CTT ATG AAG AAG ATG GGT AGC GGT 
Glu Ala Gly Lys Lys Tyr His Phe Leu Met Lys Lys Met Gly Ser Gly 
840 845 850 



3501 



GAT GGA ACT GAA TTG ACT ATA AGC GAA GGT GGT GGA AGC GAT TAC ACC 
Asp Gly Thr Glu Leu Thr lie Ser Glu Gly Gly Gly Ser Asp Tyr Thr 
855 860 865 

TAT ACT GTC TAT CGT GAC GGC ACG AAG ATC AAG GAA GGT CTG ACG GCT 
Tyr Thr Val Tyr Arg Asp Gly Thr Lys lie Lys Glu Gly Leu Thr Ala 
870 875 880 



3549 



.3597 



ACG ACA TTC GAA GAA GAC GGT GTA GCT ACG GGC AAT CAT GAG TAT TGC 3645 
Thr Thr Phe Glu Glu Asp Gly Val Ala Thr Gly Asn His Glu Tyr Cys 
885 890 895 

GTG GAA GTT AAG TAC ACA GCC GGC GTA TCT CCG AAG GTA TGT AAA GAC 3693 
Val Glu Val Lys Tyr Thr Ala Gly Val Ser Pro Lys Val Cys Lys Asp 
900 905 910 * 915 

GTT ACG GTA GAA GGA TCC AAT GAA TTT GCT CCT GTA CAG AAC CTG ACC 3741 
Val Thr Val Glu Gly Ser Asn Glu Phe Ala Pro Val Gin Asn Leu Thr 
920 925 930 

GGT AGT GCA GTC GGC CAG AAA GTA ACG CTC AAG TGG GAT GCA CCT AAT 3789 
Gly Ser Ala Val Gly Gin Lys Val Thr Leu Lys Trp Asp Ala Pro Asn 
935 940 945 

GGT ACC CCG AAT CCA AAT CCG AAT CCG AAT CCG AAT CCC GGA ACA ACA 3837 
Gly Thr Pro Asn Pro Asn Pro Asn Pro Asn Pro Asn Pro Gly Thr Thr 
950 955 960 

ACA CTT TCC GAA TCA TTC GAA AAT GGT ATT CCT GCC TCA TGG AAG ACG 3885 
Thr Leu Ser Glu Ser Phe Glu Asn Gly lie Pro Ala Ser Trp Lys Thr 
965 970 975 

ATC GAT GCA GAC GGT GAC GGG CAT GGC TGG AAG CCT GGA AAT GCT CCC 3933 
lie Asp Ala Asp Gly Asp Gly His Gly Trp Lys Pro Gly Asn Ala Pro 
980 * 985 ~ 990 * 995 

GGA ATC GCT GGC TAC AAT AGC AAT GGT TGT GTA TAT TCA GAG TCA TTC 3981 
Gly lie Ala Gly Tyr Asn Ser Asn Gly Cys Val Tyr Ser Glu Ser Phe 
1000 1005 1010 

GGT CTT GGT GGT ATA GGA GTT CTT ACC CCT GAC AAC TAT CTG ATA ACA 4029 
Gly Leu Gly Gly He Gly Val Leu Thr Pro Asp Asn Tyr Leu He Thr 
1015 1020 * 1025 

CCG GCA TTG GAT TTG CCT AAC GGA GGT AAG TTG ACT TTC TGG GTA TGC 4077 
Pro Ala Leu Asp Leu Pro Asn Gly Gly Lys Leu Thr Phe Trp Val Cys 
1030 1035 1040 

GCA CAG GAT GCT AAT TAT GCA TCC GAG CAC TAT GCG GTG TAT GCA TCT 4125 
Ala Gin Asp Ala Asn Tyr Ala Ser Glu His Tyr Ala Val Tyr Ala Ser 
1045 1050 1055 

TCG ACC GGT AAC GAT GCA TCC AAC TTC ACG AAT GCT TTG TTG GAA GAG 4173 
Ser Thr Gly Asn Asp Ala Ser Asn Phe Thr Asn Ala Leu Leu Glu Glu 
1060 1065 1070 1075 

ACG ATT ACG GCA AAA GGT GTT CGC TCG CCG GAA GCT ATT CGT GGT CGT 4221 
Thr He Thr Ala Lys Gly Val Arg Ser Pro Glu Ala He Arg Gly Arg 
1080 1085 1090 



ATA CAG GGT ACT TGG CGC CAG AAG ACG GTA GAC CTT CCC GCA GGT ACG 
He Gin Gly Thr Trp Arg Gin Lys Thr Val Asp Leu Pro Ala Gly Thr 
1095 1100 " 1105 



4269 
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AAA TAT GTT GCT TTC CGT CAC TTC CAA AGC ACG GAT ATG TTC TAC ATC 
Lys Tyr Val Ala Phe Arg His Phe Gin Ser Thr Asp Met Phe Tyr lie 
1110 1115 1120 



4317 



CAC CTT GAT GAG GTT GAG ATC AAG GCC AAC GGC AAG CGC GCA GAC TTC 
Asp Leu Asp Glu Val Glu He Lys Ala Asn Gly Lys Arg Ala Asp Phe 
1125 1130 1135 



4365 



ACG GAA ACG TTC GAG TCT TCT ACT CAT GGA GAG GCA CCG GCG GAA TGG 
Thr Glu Thr Phe Glu Ser Ser Thr His Gly Glu Ala Pro Ala Glu Trp 
1140 1145 1150 1155 



4413 



ACT ACT ATC GAT GCC GAT GGC GAT GGT CAG GGT TGG CTC TGT CTG TCT 
Thr Thr He Asp Ala Asp Gly Asp Gly Gin Gly Trp Leu Cys Leu Ser 
1160 1165 1170 



4461 



TCC GGA CAA TTG GAC TGG CTG ACA GCT CAT GGC GGC ACC AAC GTA GTA 
Ser Gly Gin Leu Asp Trp Leu Thr Ala His Gly Gly Thr Asn Val Val 
1175 1180 1185 



4509 



GCC TCT TTC TCA TGG AAT GGA ATG GCT TTG AAT CCT GAT AAC TAT CTC 
Ala Ser Phe Ser Trp Asn Gly Met Ala Leu Asn Pro Asp Asn Tyr Leu 
1190 1195 1200 



4557 



ATC TCA AAG GAT GTT ACA GGC GCA ACG AAG GTA AAG TAC TAC TAT GCA 
He Ser Lys Asp Val Thr Gly Ala Thr Lys Val Lys Tyr Tyr Tyr Ala 
1205 1210 1215 



4605 



GTC AAC GAC GGT TTT CCC GGG GAT CAC TAT GCG GTG ATG ATC TCC AAG 
Val Asn Asp Gly Phe Pro Gly Asp His Tyr Ala Val Met He Ser Lys 
1220 1225 1230 1235 



4653 



ACG GGC ACG AAC GCC GGA GAC TTC ACG GTT GTT TTC GAA GAA ACG CCT 
Thr Gly Thr Asn Ala Gly Asp Phe Thr Val Val Phe Glu Glu Thr Pro 
1240 1245 i 1250 



4701 



AAC GGA ATA AAT AAG GGC GGA GCA AGA TTC GGT CTT TCC ACG GAA GCC 
Asn Gly He Asn Lys Gly Gly Ala Arg Phe Gly Leu Ser Thr Glu Ala 
1255 1260 1265 



4749 



AAT GGC GCC AAA CCT CAA AGT GTA TGG ATC GAG CGT ACG GTA GAT TTG 
Asn Gly Ala Lys Pro Gin Ser Val Trp He Glu Arg Thr Val Asp Leu 
1270 1275 1280 



4797 



CCT GCG GGC ACG AAG TAT GTT GCT TTC CGT CAC TAC AAT TGC TCG GAT 
Pro Ala Gly Thr Lys Tyr Val Ala Phe Arg His Tyr Asn Cys Ser Asp 
1285 1290 1295 



4845 



TTG AAC TAC ATT CTT TTG GAT GAT ATT CAG TTC ACC ATG GGT GGC AGC 
Leu Asn Tyr He Leu Leu Asp Asp He Gin Phe Thr Met Gly Gly Ser 
1300 1305 1310 1315 



4893 



CCC ACC CCG ACC GAT TAT ACC TAC ACG GTG TAT CGT GAC GGT ACG AAG 
Pro Thr Pro Thr Asp Tyr Thr Tyr Thr Val Tyr Arg Asp Gly Thr Lys 
1320 1325 1330 



4941 



ATC AAG GAA GGT CTG ACC GAA ACG ACC TTC GAA GAA GAC GGC GTA GCT 
He Lys Glu Gly Leu Thr Glu Thr Thr Phe Glu Glu Asp Gly Val Ala 
1335 1340 1345 



4989 



ACA GGC AAT CAT GAG TAT TGC GTG GAA GTG AAG TAC ACA GCC GGC GTA 
Thr Gly Asn His Glu Tyr Cys Val Glu Val Lys Tyr Thr Ala Gly Val 
1350 1355 1360 



5037 



TCT CCG AAA GAG TGC GTA AAC GTA ACT ATT AAT CCG ACT CAG TTC AAT 
Ser Pro Lys Glu Cys Val Asn Val Thr He Asn Pro Thr Gin Phe Asn 
1365 1370 1375 



5085 
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CCT GTA AAG AAC CTG AAG GCA CAA CCG GAT GGC GGC GAC GTG GTT CTC 5X33 
Pro Val Lys Asn Leu Lys Ala Gin Pro Asp Gly Gly Asp Val Val Leu 
1380 1385 1390 1395 

AAG TGG GAA GCC CCG AGC GCA AAA AAG ACA GAA GGT TCT CGT GAA GTA 5181 
Lys Trp Glu Ala Pro Ser Ala Lys Lys Thr Glu Gly Ser Arg Glu Val 
1400 1405 ~ 1410 

AAA CGG ATC GGA GAC GGT CTT TTC GTT ACG ATC GAA CCT GCA AAC GAT 5229 
Lys Arg lie Gly Asp Gly Leu Phe Val Thr He Glu Pro Ala Asn Asp 
1415 1420 1425 

GTA CGT GCC AAC GAA GCC AAG GTT GTG CTC GCA GCA GAC AAC GTA TGG 5277 
Val Arg Ala Asn Glu Ala Lys Val Val Leu Ala Ala Asp Asn Val Trp 
1430 1435 1440 

GGA GAC AAT ACG GGT TAC CAG TTC TTG TTG GAT GCC GAT CAC AAT ACA 5325 
Gly Asp Asn Thr Gly Tyr Gin Phe Leu Leu Asp Ala Asp His Asn Thr 
1445 1450 1455 

TTC GGA AGT GTC ATT CCG GCA ACC GGT CCT CTC TTT ACC GGA ACA GCT 5373 

?^« Gly Ser Val Ile Pro Ala Thr Gl y Pro Leu Phe Thr Gly Thr Ala 
1460 1465 1470 1475 

TCT TCC AAT CTT TAC AGT GCG AAC TTC GAG TAT TTG ATC CCG GCC AAT 5421 
Ser Ser Asn Leu Tyr Ser Ala Asn Phe Glu Tyr Leu Ile Pro Ala Asn 
1480 1485 1490 

GCC GAT CCT GTT GTT ACT ACA CAG AAT ATT ATC GTT ACA GGA CAG GGT 5469 
Ala Asp Pro Val Val Thr Thr Gin Asn Ile Ile Val Thr Gly Gin Gly 
1495 1500 1505 

GAA GTT GTA ATC CCC GGT GGT GTT TAC GAC TAT TGC ATT ACG AAC CCG 5517 
Glu Val Val He Pro Gly Gly Val Tyr Asp Tyr Cys Ile Thr Asn Pro 
1510 1515 1520 

GAA CCT GCA TCC GGA AAG ATG TGG ATC GCA GGA GAT GGA GGC AAC CAG * 5565 

?5? c Ala Ser Gly Lys Met Tr P Ile Ala G1 y As P Gly Gly Asn Gin 

1525.-.; • « 1530 : 1535 ■ ' V 

CCT GCA CGT TAT GAC GAT TTC ACA TTC GAA GCA GGC AAG AAG TAC ACC 5613 

a Arg Tyr As P As P phe Thr phe Glu Ala Gly Lys Lys Tyr Thr 
1540 1545 1550 1555 

11° * TC CGT CGC GCC GGA ATG GGA GAT GGA ACT GAT ATG GAA GTC 5661 

Phe Thr Met Arg Arg Ala Gly Met Gly Asp Gly Thr Asp Met Glu Val 
1560 1565 * 1570 

GAA GAC GAT TCA CCT GCA AGC TAT ACC TAT ACA GTC TAT CGT GAC GGC 5709 
Glu Asp Asp Ser Pro Ala Ser Tyr Thr Tyr Thr Val Tyr Arg Asp Gly 
1575 1580 1585 

ACG AAG ATC AAG GAA GGT CTG ACC GAA ACG ACC TAC CGC GAT GCA GGA 5757 
Thr Lys lie Lys Glu Gly Leu Thr Glu Thr Thr Tyr Arg Asp Ala Glv 
1590 1595 1600 

ATG AGT GCA CAA TCT CAT GAG TAT TGC GTA GAG GTT AAG TAC GCA GCC 5805 
Met Ser Ala Gin Ser His Glu Tyr Cys Val Glu Val Lys Tyr Ala Ala 
1605 1610 1615 

GGC GTA TCT CCG AAG GTT TGT GTG GAT TAT ATT CCT GAC GGA GTG GCA 5853 
?i?« Val Ser Pr ° Lys Val °y s Val As P Ile Pr ° Asp Gly Val Ala 

1620 1625 1630 " 1635 

GAC GTA ACG GCT CAG AAG CCT TAC ACG CTG ACA GTT GTT GGA AAG ACG 5901 
Asp Val Thr Ala Gin Lys Pro Tyr Thr Leu Thr Val Val Gly Lys Thr 
1640 1645 1650 
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ATC ACG GTA ACT TGC CAA GGC GAA GCT ATG ATC TAC GAC ATG AAC GGT 5949 
lie Thr Val Thr Cys Gin Gly Glu Ala Met lie Tyr Asp Met Asn Gly 
1655 1660 1665 

CGT CGT CTG GCA GCC GGT CGC AAC ACA GTT GTT TAC ACG GCT CAG GGC 5997 
Arg Arg Leu Ala Ala Gly Arg Asn Thr Val Val Tyr Thr Ala Gin Gly 
1670 1675 1680 

GGC TAC TAT GCA GTC ATG GTT GTC GTT GAC GGC AAG TCT TAC GTA GAG 6045 
Gly Tyr Tyr Ala Val Met Val Val Val Asp Gly Lys Ser Tyr Val Glu 
1685 1690 1695 



AAA CTC GCT GTA AAG TAATTCTGTC TTGGACTCGG AGACTTTGTG CAGACACTTT 6100 
Lys Leu Ala Val Lys 
1700 170 



TAATATAGGT 


CTGTAATTGT 


CTCAGAGTAT 


GAATCGATCG 


CCCGACCTCC 


TTTTAAGGAA 


6160 


GTCTGGGCGA 


CTTCGTTTTT 


ATGCCTATTA 


TTCTAATATA 


CTTCTGAAAC 


AATTTGTTCC 


6220 


AAAAAGTTGC 


ATGAAAAGAT 


TATCTTACTA 


TCTTTGCACT 


GCAAAAGGGG 


AGTTTCCTAA 


6280 


GGTTTTCCCC 


GGAGTAGTAC 


GGTAATAACG 


GTGTGGTAGT 


TCAGCTGGTT 


AGAATACCTG 


6340 


CCTGTCACGC 


AGGGGGTCGC 


GGGTTCGAGT 


CCCGTCCATA 


CCGCTAAATA 


GCTGAAAGAT 


6400 


AGGCTATAGG 


TCATCTGAAG 


CAATTTTAGA 


AACGAATCCA 


AAAGCGTCTT 


AATTCCAACG 


6460 


AATTAAGGCG 


CTTTTTCTTT 


GTCGCCACCC 


CACACGTCGG 


ATG AGGTTCG 


GAATAGGCGT 


6520 


ATATTCCGTA 


AATATGCCTC 


CGGTGGTTCC 


ATTTTGGTTA 


CAAAAAACAA 


AGGGGCTGAA 


6580 


AATTGTAACC 


ACAGACGACG 


TTAAGACGAT 


GTTTAGACGA 


TTGACAAATT 


ACTCTGTTTC 


6640 


AAAATCATAT 


GTCGAACTTT 


GTAGCCGTAT 


GG TT AC ACT A 


ATTTTGGAGC 


AAAATGAAGA 


6700 


GTCAATTTCG 


TTCAGTTTTT 


TACTTGCGCA 


GCAATTACAT 


CAACAAAGAA 


GGTAAAACTC 


6760 


CTGTCCTTAT 


TCGTATTTAT 


CTGAATAAGG 


AACG CCTGTC 


GTTGGGTTCG 


ACAGGGCTGG 


6820 


CTGTTAATCC 


CATACAATGG 


GATTCAGAAA 


AAGAGAAAGT 


CAAAGGACAT 


AGTGCAGAAG 


6880 


CACTTGAAGT 


CAATCGAAAG 


AT CG AAG AAA 


TCAGGGCTGA 


TATTCTGACC 


ATTTACAAAC 


6940 


GTTTGGAAGT 


AACAGTAGAT 


GATTTGACGC 


CGGAGAGGAT 


CAAATCGGAA 


TACTGCGGAC 


7000 


AGACGGATAC 


ATTAAACAGT 


ATAGTGGAAC 


TTTTCGATAA 


ACATAACGAG 


GATGTCCGGG 


7060 


CCCAGGTGGG 


AATCAATAAA 


ACG G CTG CCA 


CTTTACAAAA 


ATACGAAAAC 


AGCAAACGGC 


7120 


ATTTTACCCG 


ATTCCTCAAA 


GCGAAGTACA 


ACAGAACGGA 


TCTCAAATTC 


TCAGAGCTTA 


7180 


CCCCGTTGGT 


CATTCATAAC 


TTTGAGATAT 


AT CTG CTG AC 


TGT AG CCC AT 


TGTTGCCCGA 


7240 


ATACGGCAAC 


CAAAATCTTG 


AAGCTT 








7266 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1704 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
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Met Lys Asn Leu Asn Lys Phe Val Ser lie Ala Leu Cys Ser Ser Leu 
1 5 10 15 

Leu Gly Gly Met Ala Phe Ala Gin Gin Thr Glu Leu Gly Arg Asn Pro 
20 25 30 

Asn Val Arg Leu Leu Glu Ser Thr Gin Gin Ser Val Thr Lys Val Gin 
35 40 45 

Phe Arg Met Asp Asn Leu Lys Phe Thr Glu Val Gin Thr Pro Lys Gly 
50 55 60 

He Gly Gin Val Pro Thr Tyr Thr Glu Gly Val Asn Leu Ser Glu Lys 
65 70 75 80 

Gly Met Pro Thr Leu Pro He Leu Ser Arg Ser Leu Ala Val Ser Asp 
85 90 95 

Thr Arg Glu Met Lys Val Glu Val Val Ser Ser Lys Phe He Glu Lys 
100 105 - HO 

Lys Asn Val Leu He Ala Pro Ser Lys Gly Met He Met Arg Asn Glu 
H5 120 125 

Asp Pro Lys Lys He Pro Tyr Val Tyr Gly Lys Ser Tyr Ser Gin Asn 
130 135 140 

Lys Phe Phe Pro Gly Glu He Ala Thr Leu Asp Asp Pro Phe He Leu 
145 150 155 160 

Arg Asp Val Arg Gly Gin Val Val Asn Phe Ala Pro Leu Gin Tyr Asn 
165 170 175 

Pro Val Thr Lys Thr Leu Arg He Tyr Thr Glu He Thr Val Ala Val 
180 185 190 

Ser Glu Thr Ser Glu Gin Gly Lys Asn He Leu Asn Lys Lys Gly Thr 
195 200 205 

Phe Ala Gly Phe Glu Asp Thr Tyr Lys Arg Met Phe Met Asn Tyr Glu 
210 215 220 

Pro Gly Arg Tyr Thr Pro Val Glu Glu Lys Gin . Asn Gly Arg Met He 
225 230 235 240 

Val He Val Ala Lys Lys Tyr Glu Gly Asp He Lys Asp Phe Val Asp 
245 250 255 

Trp Lys Asn Gin Arg Gly Leu Arg Thr Glu Val Lys Val Ala Glu Asp 
260 265 270 

He Ala Ser Pro Val Thr Ala Asn Ala He Gin Gin Phe Val Lys Gin 
275 280 285 

Glu Tyr Glu Lys Glu Gly Asn Asp Leu Thr Tyr Val Leu Leu Val Glv 
290 295 300 

Asp His Lys Asp He Pro Ala Lys He Thr Pro Gly He Lys Ser Asp 
305 310 315 320 

Gin Val Tyr Gly Gin He Val Gly Asn Asp His Tyr Asn Glu Val Phe 
325 330 335 

He Gly Arg Phe Ser Cys Glu Ser Lys Glu Asp Leu Lys Thr Gin He 
340 345 350 
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Asp Arg Thr lie His Tyr Glu Arg Asn lie Thr Thr Glu Asp Lys Trp 
355 360 365 

Leu Gly Gin Ala Leu Cys lie Ala Ser Ala Glu Gly Gly Pro Ser Ala 
370 375 380 

Asp Asn Gly Glu Ser Asp He Gin His Glu Asn Val He Ala Asn Leu 
385 390 395 400 

Leu Thr Gin Tyr Gly Tyr Thr Lys lie He Lys Cys Tyr Asp Pro Gly 
405 410 415 

Val Thr Pro Lys Asn He He Asp Ala Phe Asn Gly Gly He Ser Leu 
420 425 430 

Val Asn Tyr Thr Gly His Gly Ser Glu Thr Ala Trp Gly Thr Ser His 
435 440 445 

Phe Gly Thr Thr His Val Lys Gin Leu Thr Asn Ser Asn Gin Leu Pro 
450 455 460 

Phe He Phe Asp Val Ala Cys Val Asn Gly Asp Phe Leu Phe Ser Met 
465 470 475 480 

Pro Cys Phe Ala Glu Ala Leu Met Arg Ala Gin Lys Asp Gly Lys Pro 
485 490 495 

Thr Gly Thr Val Ala He He Ala Ser Thr He Asn Gin Ser Trp Ala 
500 505 510 

Ser Pro Met Arg Gly Gin Asp Glu Met Asn Glu He Leu Cys Glu Lys 
515 520 525 

His Pro Asn Asn He Lys Arg Thr Phe Gly Gly Val Thr Met Asn Gly 
530 535 " 540 

Met Phe Ala Met Val Glu Lys Tyr Lys Lys Asp Gly Glu Lys Met Leu 
545 550 555 . . 560 

Asp Thr Trp Thr Val Phe Gly Asp Pro Ser Leu Leu Val Arg Thr Leu 
565 570 575 

Val Pro Thr Lys Met Gin Val Thr Ala Pro Ala Gin He Asn Leu Thr 
580 585 590 

Asp Ala Ser Val Asn Val Ser Cys Asp Tyr Asn Gly Ala He Ala Thr 
595 600 605 

He Ser Ala Asn Gly Lys Met Phe Gly Ser Ala Val Val Glu Asn Gly 
610 615 620 

Thr Ala Thr He Asn Leu Thr Gly Leu Thr Asn Glu Ser Thr Leu Thr 
625 630 635 640 

Leu Thr Val Val Gly Tyr Asn Lys Glu Thr Val He Lys Thr He Asn 
645 650 655 

Thr Asn Gly Glu Pro Asn Pro Tyr Gin Pro Val Ser Asn Leu Thr Ala 
660 665 670 

Thr Thr Gin Gly Gin Lys Val Thr Leu Lys Trp Asp Ala Pro Ser Thr 
675 * 680 685 

Lys Thr Asn Ala Thr Thr Asn Thr Ala Arg Ser Val Asp Gly He Arg 
690 695 700 
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Glu Leu Val Leu Leu Ser Val Ser Asp Ala Pro Glu Leu Leu Arg Ser 
705 710 715 720 

Gly Gin Ala Glu lie Val Leu Glu Ala His Asp Val Trp Asn Asp Gly 
725 730 " 735 

Ser Gly Tyr Gin lie Leu Leu Asp Ala Asp His Asp Gin Tyr Gly Gin 
740 745 750 

Val lie Pro Ser Asp Thr His Thr Leu Trp Pro Asn Cys Ser Val Pro 
755 760 765 

Ala Asn Leu Phe Ala Pro Phe Glu Tyr Thr Val Pro Glu Asn Ala Asp 
770 775 780 

Pro Ser Cys Ser Pro Thr Asn Met lie Met Asp Gly Thr Ala Ser Val 
785 790 795 800 

Asn lie Pro Ala Gly Thr Tyr Asp Phe Ala lie Ala Ala Pro Gin Ala 
805 810 815 

Asn Ala Lys lie Trp lie Ala Gly Gin Gly Pro Thr Lys Glu Asp Asp 
820 825 830 

Tyr Val Phe Glu Ala Gly Lys Lys Tyr His Phe Leu Met Lys Lys Met 
835 840 845 

Gly Ser Gly Asp Gly Thr Glu Leu Thr lie Ser Glu Gly Gly Gly Ser 
850 855 860 

Asp Tyr Thr Tyr Thr Val Tyr Arg Asp Gly Thr Lys lie Lys Glu Gly 
865 870 875 880 

Leu Thr Ala Thr Thr Phe Glu Glu Asp Gly Val Ala Thr Gly Asn His 
885 890 895 

Glu Tyr Cys Val Glu -Val Lys Tyr Thr Ala Gly Val Ser Pro Lys Val 
900 - 905 910 

Cys Lys Asp Val Thr Val Glu Gly Ser Asn Glu Phe Ala Pro Val Gin 
915 920 925 

Asn Leu Thr Gly Ser Ala Val Gly Gin Lys Val Thr Leu Lys Trp Asp 
930 935 940 

Ala Pro Asn Gly Thr Pro Asn Pro Asn Pro Asn Pro Asn Pro Asn Pro 
945 950 955 960 

Gly Thr Thr Thr Leu Ser Glu Ser Phe Glu Asn Gly lie Pro Ala Ser 
965 970 * 975 

Trp Lys Thr lie Asp Ala Asp Gly Asp Gly His Gly Trp Lys Pro Gly 
980 985 990 

Asn Ala Pro Gly lie Ala Gly Tyr Asn Ser Asn Gly Cys Val Tyr Ser 
995 1000 1005 

Glu Ser Phe Gly Leu Gly Gly lie Gly Val Leu Thr Pro Asp Asn Tyr 
1010 1015 1020 

Leu lie Thr Pro Ala Leu Asp Leu Pro Asn Gly Gly Lys Leu Thr Phe 
1025 1030 1035 1040 

Trp Val Cys Ala Gin Asp Ala Asn Tyr Ala Ser Glu His Tyr Ala Val 



1045 



1050 



1055 
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Tyr Ala Ser Ser Thr Gly Asn Asp Ala Ser Asn Phe Thr Asn Ala Leu 
1060 1065 1070 

Leu Glu Glu Thr lie Thr Ala Lys Gly Val Arg Ser Pro Glu Ala lie 
1075 1080 1085 

Arg Gly Arg He Gin Gly Thr Trp Arg Gin Lys Thr Val Asp Leu Pro 
1090 1095 1100 

Ala Gly Thr Lys Tyr Val Ala Phe Arg His Phe Gin Ser Thr Asp Met 
1105 1110 1115 1120 

Phe Tyr He Asp Leu Asp Glu Val Glu lie Lys Ala Asn Gly Lys Arg 
1125 1130 1135 

Ala Asp Phe Thr Glu Thr Phe Glu Ser Ser Thr His Gly Glu Ala Pro 
1140 1145 1150 

Ala Glu Trp Thr Thr He Asp Ala Asp Gly Asp Gly Gin Gly Trp Leu 
1155 1160 1165 

Cys Leu Ser Ser Gly Gin Leu Asp Trp Leu Thr Ala His Gly Gly Thr 
1170 1175 1180 

Asn Val Val Ala Ser Phe Ser Trp Asn Gly Met Ala Leu Asn Pro Asp 
1185 1190 1195 1200 

Asn Tyr Leu lie Ser Lys Asp Val Thr Gly Ala Thr Lys Val Lys Tyr 
1205 1210 1215 

Tyr Tyr Ala Val Asn Asp Gly Phe Pro Gly Asp His Tyr Ala Val Met 
1220 1225 1230 

He Ser Lys Thr Gly Thr Asn Ala Gly Asp Phe Thr Val Val Phe Glu 
1235 1240 1245 

Glu Thr Pro Asn Gly He Asn Lys Gly Gly Ala Arg Phe Gly Leu Ser 
1250 1255 1260 

Thr Glu Ala Asn Gly Ala Lys Pro Gin Ser Val Trp He Glu Arg Thr 
1265 1270 1275 1280 

Val Asp Leu Pro Ala Gly Thr Lys Tyr Val Ala Phe Arg His Tyr Asn 
1285 1290 1295 

Cys Ser Asp Leu Asn Tyr He Leu Leu Asp Asp He Gin Phe Thr Met 
1300 1305 1310 

Gly Gly Ser Pro Thr Pro Thr Asp Tyr Thr Tyr Thr Val Tyr Arg Asp 
1315 1320 1325 

Gly Thr Lys He Lys Glu Gly Leu Thr Glu Thr Thr Phe Glu Glu Asp 
1330 1335 1340 

Gly Val Ala Thr Gly Asn His Glu Tyr Cys Val Glu Val Lys Tyr Thr 
1345 1350 - 1355 1360 

Ala Gly Val Ser Pro Lys Glu Cys Val Asn Val Thr He Asn Pro Thr 
1365 1370 1375 

Gin Phe Asn Pro Val Lys Asn Leu Lys Ala Gin Pro Asp Gly Gly Asp 
1380 1385 1390 

Val Val Leu Lys Trp Glu Ala Pro Ser Ala Lys Lys Thr Glu Gly Ser 
1395 1400 1405 
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Arg Glu Val Lys Arg lie Gly Asp Gly Leu Phe Val Thr He Glu Pro 
1410 1415 1420 

Ala Asn Asp Val Arg Ala Asn Glu Ala Lys Val Val Leu Ala Ala Asp 
1425 1430 1435 1440 

Asn Val Trp Gly Asp Asn Thr Gly Tyr Gin Phe Leu Leu Asp Ala Asp 
1445 1450 1455 

His Asn Thr Phe Gly Ser Val He Pro Ala Thr Gly Pro Leu Phe Thr 
1460 1465 1470 

Gly Thr Ala Ser Ser Asn Leu Tyr Ser Ala Asn Phe Glu Tyr Leu He 
1475 1480 1485 

Pro Ala Asn Ala Asp Pro Val Val Thr Thr Gin Asn He He Val Thr 
1490 1495 1500 

Gly Gin Gly Glu Val Val He Pro Gly Gly Val Tyr Asp Tyr Cys He 
1505 1510 1515 " 1520 

Thr Asn Pro Glu Pro Ala Ser Gly Lys Met Trp He Ala Gly Asp Gly 
1525 1530 1535 

Gly Asn Gin Pro Ala Arg Tyr Asp Asp Phe Thr Phe Glu Ala Gly Lys 
1540 1545 1550 

Lys Tyr Thr Phe Thr Met Arg Arg Ala Gly Met Gly Asp Gly Thr Asp 
1555 1560 1565 

Met Glu Val Glu Asp Asp Ser Pro Ala Ser Tyr Thr Tyr Thr Val Tvr 
1570 1575 1580 J J 

Arg Asp Gly Thr Lys He Lys Glu Gly Leu Thr Glu Thr Thr Tyr Arg 
1585 1590 1595 1600 

Asp Ala Gly Met Ser Ala Gin Ser His Glu Tyr Cys Val Glu Val Lys 
1^05 1610 * 1615 

Tyr Ala Ala Gly Val Ser Pro Lys Val Cys Val Asp Tyr He Pro Asp 
1620 1625 1630 

Gly val Ala Asp Val Thr Ala Gin Lys Pro Tyr Thr Leu Thr Val Val 
1635 1640 " 1645 

Gly Thr Ile Tnr Val Thr C Y S Gin Gly Glu Ala Met He Tyr Asp 

1650 1655 1660 

Met Asn Gly Arg Arg Leu Ala Ala Gly Arg Asn Thr Val Val Tyr Thr 
1665 1670 1675 1680 

Ala Gin Gly Gly Tyr Tyr Ala Val Met Val Val Val Asp Gly Lys Ser 
1685 1690 1695 

Tyr Val Glu Lys Leu Ala Val Lys 
1700 
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WE CLAIM: 

1 



1 1. A recombinant DNA molecule comprising a nucleotide sequence 

2 encoding an Arg-gingipain protein having an amino acid 

3 sequence selected from group consisting of sequences as 

4 given in SEQ ID NO: 4 from amino acid 1 through amino acid 

5 510, SEQ ID NO:ll from amino acid 228 through amino acid 

6 719, and an amino acid sequence having at least about 8 5% 

7 amino acid sequence identity with a sequence as given in 

8 SEQ ID NO: 11 from amino acid 228 to amino acid 719. 

1 2. The recombinant DNA molecule of claim 1, wherein said 

2 nucleotide sequence is as given in one of SEQ ID NO: 3 from 

3 nucleotide 1630 through nucleotide 3105 and SEQ ID NO MO 

4 from nucleotide 1630 through nucleotide 3105. 

1 3. A recombinant DNA molecule comprising a nucleic acid 

2 portion encoding a high molecular weight Arg-gingipain 

3 comprising an enzymatically active protease component and 

4 a hemagglutinin component. 

1 4 . The recombinant DNA molecule of claim 3 wherein said 

2 encoded high molecular weight Arg-gingipain has an 

3 enzymatically active protease component having an amino 

4 acid sequence as given in one of SEQ ID NO:4 from amino 

5 acid l to amino acid 510 and SEQ ID NO: 11 from amino acid 

6 228 to amino acid 719. 

1 5. The recombinant DNA molecule of claim 4 herein said high 

2 molecular weight Arg-gingipain has an enzymatically active 

3 protease component having an amino acid sequence as given 

4 in SEQ ID NO: 11 from amino acid 228 to amino acid 719 and 

5 a hemagglutinin component having an amino acid sequence 

6 selected from the group consisting from amino acid 72 0 to 

7 amino acid 1091, from amino acid 1092 to amino acid 1429 

8 and from amino acid 1430-1704, each as given in SEQ ID NO: 11. 
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1 6. The recombinant DNA molecule of claim 4 wherein said mature 

2 enzymatically active protease component is encoded by a 

3 nucleotide sequence as given in one of SEQ ID NO: 3 from 

4 nucleotide 1630 to nucleotide 3105, in SEQ ID NO: 10 from 

5 nucleotide 1630 to nucleotide 3105 or a nucleotide sequence 

6 having at least 70% homology to one of said sequences. 

1 7. The recombinant DNA molecule of claim l wherein said Arg- 

2 gingipain is encoded within a nucleotide sequence as given 

3 in SEQ ID NO: 10 from nucleotide 949-6063, or a nucleotide 

4 sequence having at least about 70% sequence homology 

5 thereto . 

1 8. The recombinant DNA molecule of claim 7 wherein said Arg- 

2 gingipain is expressed as a prepolyprotein having an amino 

3 acid sequence as given in SEQ ID NO: 11. 

1 9. The recombinant DNA molecule of claim 8 wherein the 

2 nucleotide sequence encoding said polyprotein is as given 

3 in SEQ ID NO: 10 from nucleotide 949 to nucleotide 6063. 
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